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Abstract 

Solving stochastic optimization problems under partial observability, where one needs to adaptively make 
decisions with uncertain outcomes, is a fundamental but notoriously difficult challenge. In this paper, we 
introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We 
prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive 
with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and 
coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy 
evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular 
objectives arising in diverse applications including sensor placement, viral marketing and active learning. 
Proving adaptive submodularity for these problems allows us to recover existing results in these applications as 
special cases, improve approximation guarantees and handle natural generalizations. 

Keywords: Adaptive Optimization, Stochastic Optimization, Submodularity, Partial Observability, Active 
Learning, Optimal Decision Trees 



1. Introduction 



In many practical optimization problems one needs to adaptively make a sequence of decisions, taking into 
account observations about the outcomes of past decisions. Often these outcomes are uncertain, and one may 
only know a probability distribution over them. Finding optimal policies for decision making in such partially 
observable stochastic optimization problems is notoriously intractable (see, e.g., Littman et al. ( 1998 1). In this 



paper, we introduce the concept of adaptive submodularity, and prove that if a partially observable stochastic 
optimization problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to obtain 
near-optimal solutions. Adaptive submodularity generalizes the notion of submodularity ' which has been 
successfully used to develop approximation algorithms for a variety of non-adaptive optimization problems. 
Submodularity, informally, is an intuitive notion of diminishing returns, which states that adding an element to 



a small set helps more than adding that same element to a larger (super-)set. A celebrated result of Nemhauser 



et al. 1 1978 ) guarantees that for such submodular functions, a simple greedy algorithm, which adds the element 
that maximally increases the objective value, selects a near-optimal set of k elements. Similarly, it is guaranteed 



find a set of near-minimal cost that achieves a desired quota of utility ( Wolsey 1982 1, using near-minimum 



average time to do so ( Streeter and Golovin 2008 1. Besides guaranteeing theoretical performance bounds. 



submodularity allows us to speed up algorithms by using lazy evaluations (Minoux 1978 1, often leading to 
performance improvements of several orders of magnitude ( Leskovec et al. 2007| l. The challenge in general- 



izing submodularity to adaptive planning is that feasible solutions are now policies (decision trees) instead of 
subsets. We propose a natural generalization of the diminishing retums property for adaptive problems, which 



0. An extended abstract of this work appeared in COLT 2010 ' Golovin and Krause 2010| 

1 . For an extensive ti'eatment of submodularity, see the books ol hujishige ' IWlj an3 ScTirijver j2003 
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reduces to the classical characterization of submodular set functions for deterministic distributions. We show 
how these results of'Nemhauser et al. Wolsey Streeter and Golovin and Minoux generalize to the adaptive 



setting. Hence, we demonstrate how adaptive submodular optimization problems enjoy similar theoretical 
and practical benefits of classical, nonadaptive submodular problems. We further demonstrate the usefulness 
and generality of the concept by showing how it captures known results in stochastic optimization and active 
learning as special cases, admits tighter performance bounds, and leads to natural generalizations. 

As a first example, consider the problem of deploying a collection of sensors to monitor some spatial 
phenomenon. Each sensor can cover a region depending on its sensing range. Suppose we would like to 
find the best subset of k locations to place the sensors. In this application, intuitively, adding a sensor helps 
more if we have placed few sensors so far and helps less if we have already placed many sensors. We can 
formalize this diminishing returns property using the notion of submodularity - the total area covered by the 
sensors is a submodular function defined over all sets of locations. |Krause and Guestrin| POOT] ! show that 
many more realistic utility functions in sensor placement (such as the improvement in prediction accuracy 
w.rt. some probabilistic model) are submodular as well. Now consider the following stochastic variant: Instead 
of deploying a fixed set of sensors, we deploy one sensor at a time. With a certain probability, deployed sensors 
can fail, and our goal is to maximize the area covered by the functioning sensors. Thus, when deploying the 
next sensor, we need to take into account which of the sensors we deployed in the past failed. This problem 
has been studied by Asadpour et al. ( 2008 1 for the case where each sensor fails independently at random. In 
this paper, we show that the coverage objective is adaptive submodular, and use this concept to handle more 
general settings (where, e.g., rather than all-or-nothing failures there are different types of sensor failures of 
varying severity). We also consider the related problem of placing the minimum number of sensors to achieve 
the maximum possible sensor coverage (i.e., the coverage obtained by deploying sensors everywhere). This 
problem is equivalent to one studied by IGoemans and Vondrak ( 2006 1, and generalizes a problem studied by 



Liu et al. (2008). As with the maximum coverage version, adaptive submodularity allows us to recover and 



generalize previous results. 

As another example, consider a viral marketing problem, where we are given a social network, and we 
want to influence as many people as possible in the network to buy some product. We do that by giving the 
product for free to a subset of the people, and hope that they convince their friends to buy the product as well. 
Formally, we have a graph, and each edge e is labeled by a number < Pe < 1. We "influence" a subset 
of nodes in the graph, and for each influenced node, their neighbors get randomly influenced according to 
the probability annotated on the edge connecting the nodes. This process repeats until no further node gets 
influenced. Kempe et al. ( 200 3 | l show that the set function which quantifies the expected number of nodes 
influenced is submodular. A natural stochastic variant of the problem is where we pick a node, get to see 
which nodes it influenced, then adaptively pick the next node based on these observations and so on. We show 
that a large class of such adaptive influence maximization problems satisfies adaptive submodularity. 

Our third application is in pool-based active learning, where we are given an unlabeled data set, and we 
would like to adaptively pick a small set of examples whose labels imply all other labels. Thus, we want to pick 
examples to shrink the remaining version space (the set of consistent hypotheses) as quickly as possible. Here, 
we show that the reduction in version space probability mass is adaptive submodular, and use that observation 
to prove that the adaptive greedy algorithm is a near-optimal querying policy, recovering and generalizing 
results bylKosaraju et al.'('1999') and Dasgupta ( 2004| i. Our results for active leaming are also related to recent 
results of Guillory and Bilmes ( 2 010| who study a generalization of submodular set cover to an interactive 
setting. In contrast to our approach however, [Guillory and Bilmes| analyze worst-case costs, and use rather 
different technical definitions and proof techniques. 

We summarize our main contributions below, and provide a more technical summary in Table |T| At a high 
level, our main contributions are: 

• We consider a particular class of partially observable adaptive stochastic optimization problems, which 
we prove to be hard to approximate in general. 
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Name 


New Results 


A.S. Maximization 


Tight (1 — l/e)-approx. for adaptive monotone submodular objectives, [ 


5.1 


A.S. Min Cost Coverage 


Tight logarithmic approx. for adaptive monotone submodular objectives, { 


5.2 


A.S. Min Sum Cover 


Tight 4-approx. for adaptive monotone submodular objectives, ^5.3 


Data Dependent Bounds 


Generalization of the data-dependent bounds for submodular functions, [ 5.1 


A fpplpratpH AHant r~JrppHv 


Generalization of lazy evaluations to the adaptive setting, §4 


Stochastic Submodular 
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distributions, and to item costs, i6 


Stochastic Set Cover 


Generalization of the previous (ln(n) + l)-approx. to arbitrary per-item set 
distributions, with item costs, ^ 


Adaptive Viral Marketing 


Adaptive analog of previous (1 — l/e)-approx. for non-adaptive viral market- 
ing, under more general reward functions; tight logarithmic approx. for the 
adaptive min cost cover version, ^|8j 


Active Learning 


Improved approx. factor of generalized binary search and its approximate 
versions with and without item costs, ^|9l 


Hardness in the absence of 
Adaptive Submodularity 


f2(|-B|^~'^)-approximation hardness for A.S. Maximization, Min Cost Cover- 
age, and Min-Sum Cover, if / is not adaptive submodular. i 12 



Table 1: Summary of our theoretical results. A.S. is shorthand for "Adaptive Stochastic". 



• We introduce the concept of adaptive submodularity, and prove that if a problem instance satisfies 
this property, a simple adaptive greedy policy performs near-optimally, for both adaptive stochastic 
maximization and coverage, and also a natural min-sum objective. 

• We show how adaptive submodularity can be exploited by allowing the use of an accelerated adaptive 
greedy algorithm using lazy evaluations, and how we can obtain tight, data-dependent bounds. 

• We illustrate adaptive submodularity on several realistic problems, including Stochastic Maximum 
Coverage, Stochastic Submodular Coverage, Adaptive Viral Marketing, and Active Learning. For these 
applications, adaptive submodularity allows us to recover known results and prove natural generaliza- 
tions. 



Organization. This article is organized as follows. In |2jwe set up notation and formally define the relevant 
adaptive optimization problems for general objective functions. For the reader's convenience, we have 
also provided a reference table of important symbols on page 46 In ^we review the classical notion of 
submodularity and introduce the novel adaptive submodularity property. In ^we introduce the adaptive greedy 
policy, as well as an accelerated variant. In ^we discuss the theoretical guarantees that the adaptive greedy 
policy enjoys when applied to problems with adaptive submodular objectives. Sections 6 through 9 provide 
examples on how to apply the adaptive submodular framework to various applications, namely Stochastic 
Submodular Maximization (Q, Stochastic Submodular Coverage (^, Adaptive Viral Marketing (^, and 
Pool-based Active Learning (f9'). In SflCFwe report empirical results on two sensor selection problems. In ?IT| 
we discuss the adaptivity gap of the problems we consider, and in ^12|We prove hardness results indicating that 
problems which are not adaptive submodular can be extremely inapproximable under reasonable complexity 
assumptions. We review related work in pTand provide concluding remarks in pT The Appendix gives 
details of how to incorporate item costs and includes all of the proofs omitted from the main text. 



2. Adaptive Stochastic Optimization 

We start by introducing notation and defining the general class of adaptive optimization problems that we 
address in this paper. 
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Figure 1 : Illustration of a policy tt, its corresponding decision tree representation, and the decision tree 
representation of irm, the level 2 truncation of tt (as defined in iS.l). 



Items and Realizations. Let i? be a finite set of items. Each item e e £' is in a particular (initially unknown) 
state $(e) e O from a set O of possible states. Hereby, $ : i? — )• O is a (random) realization of the ground 
set, indicating which state each item is in. We take a Bayesian approach and assume that there is a (known) 
probability distribution P [<i>] over realizations. We will consider problems where we sequentially pick an 
item e G E, get to see its state ^(e), pick the next item, get to see its state, and so on. After each pick, our 
observations so far can be represented as a partial realization "if, a function from some subset of E (i.e., the 
set of items that we already picked) to their states. For notational convenience, we sometimes represent ^ asa 
relation, so that 'i' C E x O equals {(e, o) : ^'(e) = o}. We use the notation dom(^) — {e : 3o.(e, o) e ^} 
to refer to the domain of ^I* (i.e., the set of items observed in 5*). A partial realization is consistent with a 
realization $ if they are equal everywhere in the domain of ^I^. In this case we write $ ^ If \I> and ^' are 
both consistent with some $, and dom(^') C dom(5''), we say 5* is a subrealization of ^I*'. Equivalently, 5* is 
a subrealization of if and only if 5' C vj/'. 

Policies. We encode our adaptive strategy for picking items as a policy tt, which is a function from a set 
of partial realizations to E, specifying which item to pick next under a particular set of observations. If 
^I* ^ dom(7r), the policy terminates (stops picking items) upon observation of Technically, we require 
that the domain of tt is closed under subrealizations. That is, if 5*' S dom(7r) and is a subrealization of 
then '5 G dom(7r). We also allow randomized policies that are functions from a set of partial realizations to 
distributions on E. We use the notation £'(7r, $) to refer to the set of items selected by tt under realization 
$. Each deterministic policy tt can be associated with a decision tree T'^ in a natural way (see Fig. 1 for an 
illustration). Here, we adopt a policy-centric view that admits concise notation, though we find the decision 
tree view to be valuable conceptually. 

Adaptive Stochastic Maximization, Coverage, and Min-Sum Coverage. We wish to maximize, subject 
to some constraints, a utility function / : 2^ x — > M>o that depends on which items we pick and which 
state each item is in. Based on this notation, the expected utility of a policy tt is /avg(7r) := E [/(i?(7r, $), $)] 
where the expectation is taken with respect to P [$] . The goal of the Adaptive Stochastic Maximization problem 
is to find a policy it* such that 

TT* e arginax/avg(7r) subject to \E{'k, <i>)| < A; for all <&, (1) 

TT 

where /c is a budget on how many items can be picked. 

Alternatively, we can specify a quota Q of utility that we would like to obtain, and try to find the cheapest 
policy achieving that quota. Formally, we define the average cost Cavg(7r) of a policy as the expected number 
of items it picks, so that Cavg(7r) :— E [|£'(7r, <i>)|]. Our goal is then to find 

TT* e argmincavg(7r) such that f{E{TT, $), > Q for all (2) 

i.e., the policy tt* that minimizes the expected number of items picked such that under all possible realizations, 
at least utility Q is achieved. We call Problem 2 the Adaptive Stochastic Minimum Cost Cover problem. We 
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will also consider the problem where we want to minimize the worst-case cost Cwc(7r) :— max$ \E{tt, <I>)|. 
This worst-case cost c^vc(7r) is the cost incurred under adversarially chosen realizations, or equivalently the 
depth of the deepest leaf in T'^, the decision tree associated with tt. 

Yet another important variant is to minimize the average time required by a policy to obtain its utility. 
Formally, let u{tt, t) be the expected utility obtained by tt after t steps, let Q = E [/(i?, $)] be the maximum 
possible expected utility, and define the min-sum cost c^{tt) of a policy as c^{tt) :— J^tLo (Q ~ "(^' ^))- 
then define the Adaptive Stochastic Min-Sum Cover problem as the search for 

TT* e argmincs(7r) . (3) 

Unfortunately, as we will show in f[2). even for linear functions /, i.e., those where f{A, $) — 'J2eeA ""^e.* 
is simply the sum of weights (depending on the realization $), Problems ([T]i, Q, and are hard to approximate 
under reasonable complexity theoretic assumptions. Despite the hardness of the general problems, in the 
following sections we will identify conditions that are sufficient to allow us to approximately solve them. 

Incorporating Item Costs. Instead of quantifying the cost of a set E{tt, $) by the number of elements 
|i?(7r, $) |, we can also consider the case where each item e E E has a cost c(e), and the cost of a set C iJ is 
c{S) = J2ees '^(^)- th&n consider variants of Problems ([T|l, (|2]), and (|3]l with \E{tt, $)| replaced by 

c{E{tt, $)). For clarity of presentation, we will focus on the unit cost case, i.e., c(e) = 1 for all e, and explain 
how our results generalize to the non-uniform case in the Appendix. 



3. Adaptive Submodularity 

We first review the classical notion of submodular set functions, and then introduce the novel notion of adaptive 
submodularity. 

3.1 Background on Submodularity 

Let us first consider the simple special case where P [4>] is deterministic or, equivalently, \0\ = 1. In this case, 
the realization $ is known to the decision maker in advance, and thus there is no benefit in adaptive selection. 
Thus, Problem (T) is equivalent to finding a set A* C E such that 

A* e argmax/(y4, $) such that \ A\ < k. (4) 

ACE 

For most interesting classes of utility functions /, this is an NP-hard optimization problem. However, in many 
practical problems, such as those mentioned in ^ f{A) — f{A, <&) satisfies submodularity. A set function 
/ : 2^ ^ M is called submodular if, whenever A C B <Z E and e e E\B it holds that 

f{AU{e})-f{A)>f{BU{e})-f{B), (5) 

i.e., adding e to the smaller set A increases / by at least as much as adding e to the superset B. Furthermore, 



/ is called monotone, if, whenever A(- B \t holds that f{A) < f{B). A celebrated result by Nemhauser et al. 



( 1978 1 states that for monotone submodular functions with /(0) = 0, a simple greedy algorithm that starts 



with the empty set, Aq — and chooses 

= A, U {argmax/(yl, U {e})} (6) 

eeE\Ai 

guarantees that f{Ak) > (1 — 1/e) max|^|<fe f{A). Thus, the greedy set Ak obtains at least a (1 — 1/e) frac- 
tion of the optimal value achievable using k elements. Furthermore, 'Feige (1998) shows that this result is tight 
if P 7^ NP; under this assumption no polynomial time algorithm can achieve a ( 1 — 1/e + e)-approximation for 
any constant e > 0, even for the special case of Maximum fc-Cover where f{A) is the cardinality of the union 
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of sets indexed by A. Similarly, |Wolsey| d l982| l shows that the same greedy algorithm also near-optimally 
solves the deterministic case of Problem (|2]), called the Minimum Submodular Cover problem: 

A* e argmin \A\ such that f{A)>Q. (7) 

A<ZE 

Pick the first set Af^ constructed by the greedy algorithm such that f{Ai) > Q. Then, for integer-valued 
submodular functions, £ is at most |A*|(1 + log(maxe /(e))), i.e., the greedy set is at most a logarithmic 
factor larger than the smallest set achieving quota Q. For the special case of Set Cover, where f{A) is the 
cardinality of a union of sets indexed by A, this result matches a lower bound by |Feigel ( |1998| l: Unless 
NP C DTIME(n'^(^°s ") ), Set Cover is hard to approximate by a factor better than (1 - e) In Q, where Q 
is the number of elements to be covered. 

Now let us relax the assumption that P [$] is deterministic. In this case, we may still want to find a non- 
adaptive solution (i.e., a constant policy tta that always picks set A independently of $) maximizing favg{TTA)- 
If / is pointwise submodular, i.e., /(A, $) is submodular in A for any fixed $, the function f{A) — Js^m^Iita) 
is submodular, since nonnegative linear combinations of submodular functions remain submodular. Thus, the 
greedy algorithm allows us to find a near-optimal non-adaptive policy. 

However, in practice, we may be more interested in obtaining a non-constant policy tt, that adoptively 
chooses items based on previous observations. Thus, the question is whether there is a natural extension of 
submodularity to policies. In the following, we will develop such a notion - adaptive submodularity . 

3.2 Adaptive Monotonicity and Submodularity 

The key challenge is to find appropriate generalizations of monotonicity and of the diminishing returns 
condition Q. We begin by considering the simple special case where P [$] is deterministic, so that the policies 
are non-adaptive. In this case a policy tt simply specifies a sequence of items (ei , 62 , . . . , ) which it selects in 
order. Monotonicity in this context can be characterized as the property that "the marginal benefit of selecting 
an item is always nonnegative," meaning that for all such sequences (ei, 62, . . . , e^), items e and 1 < i < r it 
holds that /({ej : j < «}U{e}) — /({e^ : j < i}) > 0. Similarly, submodularity can be viewed as the property 
that "selecting an item later never increases its marginal benefit," meaning that for all sequences (ei,e2, . . . ,6^), 
items e, and all i < r, /({e, :j<i}U {e}) - /({e, : j < i}) > /{{e, : j < r} U {e}) - /({e, : j < r}). 

We take these views of monotonicity and submodularity when defining their adaptive analogues, by using 
an appropriate generalization of the marginal benefit. When moving to the general adaptive setting, the 
challenge is that for the items' states are now random and only revealed upon selection. A natural approach is 
thus to condition on observations (i.e., partial realizations of selected items), and take the expectation with 
respect to the items that we consider selecting. Hence, we define our adaptive monotonicity and submodularity 
properties in terms of the conditional expected marginal benefit of an item. 

Definition 1 (Conditional Expected Marginal Benefit) Given a partial realization ^ and an item e, the 
conditional expected marginal benefit of e conditioned on having observed 5*, denoted A (e | ^E*), is 

A(e|*) :=E[/(dom(^')U{e},$)-/(dom(*),$) | $ - *] (8) 

where the expectation is taken with respect to P [$]. Similarly, the conditional expected marginal benefit of a 
policy TT is 

A(7r|*) :=E[/(dom(«')U£;(7r,$),$)-/(dom(*),$) | $ - (9) 

We are now ready to introduce our generalizations of monotonicity and submodularity to the adaptive setting: 

Definition 2 (Adaptive Monotonicity) A function f : 2^ x — > M>o is adaptive monotone with respect 
to distribution P [<&] if the conditional expected marginal benefit of any item is nonnegative, i.e., for all ^ with 
P [^I^] > and alle e E we have 

A(e|^')>0. (10) 
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Definition 3 (Adaptive Submodularity) A function / : 2^ x O R>q is adaptive submodular with 
respect to distribution P [$] if the conditional expected marginal benefit of any fixed item does not increase as 
more items are selected and their states are observed. Formally, f is adaptive submodular w.r.t. P [$] if for all 
and such that ^ is a subrealization of"^' (i.e., ^! C ), and for all e Cz E, we have 

A(e|^'') < A(e|*). (11) 

From the decision tree perspective, the condition A(e | '5') < A(e | 5') amounts to saying that for any 
decision tree T, if we are at a node t; in T which selects an item e, and compare the expected marginal benefit 
of e selected at v with the expected marginal benefit e would have obtained if it were selected at an ancestor of 
V in T, then the latter must be no smaller than the former Note that when comparing the two expected marginal 
benefits, both the set of items previously selected, and the distribution over realizations, are different. It is also 
worth stressing that adaptive submodularity is defined relative to the distribution P [$] over realizations; it is 
possible that / is adaptive submodular with respect to one distribution, but not with respect to another 

We will give concrete examples of adaptive monotone and adaptive submodular functions that arise in 
the applications introduced in |T]in ^ ^ ^ and ^ In the Appendix, we will explain how the notion of 
adaptive submodularity can be extended to handle non-uniform costs. 

Properties of Adaptive Submodular Functions. It can be seen that adaptive monotonicity and adap- 
tive submodularity enjoy similar closure properties as monotone submodular functions. In particular, if 
Wi, . . . , Wm > and fi, ■ ■ ■ , f,n are adaptive monotone submodular w.rt. distribution P [$], then f{A, <&) ~ 
Si^i "^ifii^i ^) is adaptive monotone submodular w.r.t. P [$]. Similarly, for a fixed constant c > and 
adaptive monotone submodular function /, the function g{E, $) = min(/(£', $), c) is adaptive monotone 
submodular Thus, adaptive monotone submodularity is preserved by nonnegative linear combinations and by 
truncation. Adaptive monotone submodularity is also preserved by restriction, so that if / : 2^ x — > M>o 
is adaptive monotone submodular w.r.t. P [<&], then for any e £ E, the function g : 2^^^'^'^ x — > M>o 
defined by g{A, $) := f{A, $) for all A C iJ \ {e} and all $ is also adaptive submodular w.rt. P [$]. Finally, 
if / : 2^ X M>o is adaptive monotone submodular w.r.t. P [$] then for each partial realization the 

conditional function g{A, $) :— f{A U dom(5'), $) is adaptive monotone submodular w.r.t. P [$ | ^f]. 



4. The Adaptive Greedy Policy 

The classical non-adaptive greedy algorithm ([6]l has a natural generalization to the adaptive setting. The greedy 
policy TrS'-'^'^'^y tries, at each iteration, to myopically increase the expected objective value, given its current 
observations. That is, suppose f : 2^ x — ?► ]R>o is the objective, and ^I^ is the partial realization indicating 
the states of items selected so far. Then the greedy policy will select the item e maximizing the expected increase 
in value, conditioned on the observed states of items it has already selected (i.e., conditioned on $ ^ ^P). That is, 
it will select e to maximize the conditional expected marginal benefit A (e | ^) as defined in Eq. ( 8 1. Pseudocode 
of the adaptive greedy algorithm is given in Algorithm [l] The only difference to the classic, non-adaptive 
greedy algorithm studied by Nemhauser et al. ( 1978 1, is Line[T] where an observation $(e*) of the selected 



item e* is obtained. Note that the algorithms in this section are presented for Adaptive Stochastic Maximization. 
For the coverage objectives, we simply keep selecting items as prescribed by tt^'"'^'^''^ until achieving the quota 
on objective value (for the min-cost objective) or until we have selected every item (for the min-sum objective). 

Incorporating Item Costs. The adaptive greedy algorithm can be naturally modified to handle non-uniform 
item costs by replacing its selection rule by 

A(e|vl^) 
e £ arg max — . 

c(e) 

In the following, we will focus on the uniform cost case (c = 1), and defer the analysis with costs to the 
Appendix. 
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Approximate Greedy Selection. In some applications, finding an item maximizing A (e | may be com- 
putationally intractable, and the best we can do is find an a-approximation to the best greedy selection. This 
means we find an e' such that 

A (e' I *) > - max A (e | *) . 

a e 

We call a policy which always selects such an item an a-approximate greedy policy. 



Input: Budget k; ground set E\ distribution P [$]; function /. 

Output: SttAQE of size k 

begin 

A ^ 0; * ^ 0; 

for i = 1 to fc do 

foreach e^E\Ado compute A (e | = E [f{A U {e} , $) - f{A, $) | $ - ; 
Select e* e arg max^ A (e | ^f); 
SetA A\j{e*}; 
1 Observe $(e*); Set * ^' U {(e*, $(e*))}; 

end 

Algorithm 1: The adaptive greedy algorithm, which implements the greedy policy. 



Lazy Evaluations and the Accelerated Adaptive Greedy Algorithm. The definition of adaptive sub- 
modularity allows us to implement an "accelerated" version of the adaptive greedy algorithm using lazy 
evaluations of marginal benefits as originally suggested for the non-adaptive case by Minoux ( 1978 1. The 
idea is as follows. Suppose we run 'kS"^^'^'^ under some fixed realization $, and select items ei, 62, . • . , efc. Let 

:= {{cj, $(ej) : j < i)} be the partial realizations observed during the run of tt^''^'"'^. The adaptive greedy 
algorithm computes A(e | 5*^) for all e £ E and < i < fc, unless e e dom(^'i). Naively, the algorithm 
thus needs to compute 0(|i?|fc) marginal benefits (which can be expensive to compute). The key insight is 
that i I— > A (e I ^'i) is nonincreasing for all e G E, because of the adaptive submodularity of the objective. 
Hence, if when deciding which item to select as we know A (e' | Vfj ) < A (e | ^i) for some items e' and e 
and j < i, then we may conclude A (e' | ^'i) < A (e | ^i) and hence eliminate the need to compute A (e' | ^i). 
The accelerated version of the adaptive greedy algorithm exploits this observation in a principled manner, 
by computing A (e | for items e in decreasing order of the upper bounds known on them, until it finds an 
item whose value is at least as great as the upper bounds of all other items. Pseudocode of this version of the 
adaptive greedy algorithm is given in Algorithm[2j 

In the non-adaptive setting, the use of lazy evaluations has been shown to significantly reduce running 
times in practice ( |Leskovec et al.|[2007[ ). We evaluated the naive and accelerated implementations of the 
adaptive greedy algorithm on two sensor selection problems, and obtained speedup factors that range from 
roughly 4 to 40 for those problems. See S: 10 for details. 



5. Guarantees for the Greedy Policy 

In this section we show that if the objective function is adaptive submodular with respect to the distribution 
describing the environment in which we operate, then the greedy policy and any a-approximate greedy policy 
inherit precisely the performance guarantees of the greedy and a-approximate greedy algorithms for classic 
(non-adaptive) submodular maximization and submodular coverage problems, such as Maximum fc-Cover and 
Minimum Set Cover, as well as min-sum submodular coverage problems, such as Min-Sum Set Cover These 
guarantees suggest that adaptive submodularity is the appropriate generalization of submodularity to policies. 
In this section we focus on the unit cost case. In the Appendix we provide the proofs omitted in this section, and 
show how our results extend to non-uniform item costs if we greedily maximize the expected benefit/cost ratio. 
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Input: Budget k; ground set E; distribution P [$]; function /. 

Output: Set Ac E of size k 

begin 

A ^ 0; * -s- 0; Priority Queue Q <- EMPTY_QUEUE; 
foreach e £ E do Q. insert(e, +00); 
for i = 1 to fc do 

(^max < 00; Cinax ^ NULL; 

whUe (Jmax < Q- niaxPriority( ) do 
e <r- Q.popO; 

S^A{e\'f)=E [f{A U {e} , - f{A, $) | $ ^ 
Q. insert(e, S); 
if <5max < S then 

A ^ ^ U {e„iax}; Q- remove(e 

max J 7 

1 Observe $(emax); Set ^' U {(e„iax, $(emax))}; 

end 

Algorithm 2: The accelerated version of the adaptive greedy algorithm. Here, Q. insert(e, S) inserts e 
with priority S, Q. pop( ) removes and returns the item with greatest priority, Q. maxPriority( ) returns 
the maximum priority of the elements in Q, and Q. remove(e) deletes e from Q. 



5.1 The Maximum Coverage Objective 

In this section we consider the maximum coverage objective, where the goal is to select k items adaptively to 
maximize their expected value. Before stating our result, we require the following definition. 

Definition 4 (Policy Truncation) For a policy ir, define the level-Zc-truncation ttj^j of tt to be the policy 
obtained by running tt until it terminates or until it selects k items, and then terminating. Formally, 

dom(7r[fe]) = e dom(7r) : < k}, one/ ttj^] (*) = for all'^ £ dom(7r[j,]). 

We have the following result, which generalizes the classic result of Nemhauser et all] ( |1978 1 on maximizing 
monotone submodular functions under a cardinality constraint. 

Theorem 5 Fix any a> 1. If f is adaptive monotone and adaptive submodular with respect to the distribution 
P [$], and TT is an a-approximate greedy policy, then for all policies tt* and positive integers i and k. 



In particular, with i — k this implies any a-approximate greedy policy achieves a (l — e approximation 
to the expected reward of the best policy, if both are terminated after running for an equal number of steps. 

If the greedy rule can be implemented only with small absolute error rather than small relative error, i.e., 
A (e' I ^E*) > maxe A (e | ^E*) — e, an argument similar to that used to prove Theorem"? shows that 



/avg(7rM)> (l-e-^/^-)/avg(4]) 



£e. 



This is important, since small absolute error can always be achieved (with high probability) whenever / can be 
evaluated efficiently, and sampling P(<I> | ^) is efficient. In this case, we can approximate 

1 ^ 

A (e I *) « ^ ^ [/(dom(f ) U {e} , $,) - /(dom(vI/), , 

1=1 

where $i are sampled i.i.d. from P(<I> | ^I^). 
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Data Dependent Bounds. For the maximum coverage objective, adaptive submodular functions have 
another attractive feature: they allow us to obtain data dependent bounds on the optimum, in a manner similar 



to the bounds for the non-adaptive case (Minoux 1978^ . Consider the non-adaptive problem of maximizing 



a monotone submodular function / : 2^ — >■ K>o subject to the constraint \A\ < k. Let A* be an optimal 
solution, and fix any ACE. Then 

f{A*) < f{A) + max ^ {f{A U {e}) - f{A)) (12) 

B:\B\<k — 

because setting B = A* we have f{A*) < f{A U B) < f{A) + J2eeB (/(^ ^ {e}) - f{A)). Note that 
unlike the original objective, we can easily compute max5.|5|<i, J^eeB (/(^ ^ i^i) ~ /(^)) computing 
(5(e) := f{A U {e}) — f{A) for each e, and summing the k largest values. Hence we can quickly compute an 
upper bound on our distance from the optimal value, f{A*) — f{A). In practice, such data-dependent bounds 
can be much tighter than the problem-independent performance guarantees of [Nemhauser et alT for the greedy 



algorithm. Further note that these bounds hold for any set A, not just sets selected by the greedy algorithm. 
These data dependent bounds have the following analogue for adaptive monotone submodular functions. 

Lemma 6 (The Adaptive Data Dependent Bound) Suppose we have made observations after selecting 
dom(5'). Let tt* be any policy such that \E{t:* , $) | < kfor all $. Then for adaptive monotone submodular f 

A(7r*|^') < max VA(e|*). (13) 

ACE,\A\<k ^ ' ' ' 

^ eeA 

Thus, after running any policy tt, we can efficiently compute a bound on the additional benefit that the optimal 
solution TT* could obtain beyond the reward of tt. We do that by computing the conditional expected marginal 
benefits for all elements e, and summing the k largest of them. Note that these bounds can be computed on the 
fly when running the greedy algorithm. 

5.2 The Min Cost Cover Objective 

Another natural objective is to minimize the number of items selected while ensuring that a sufficient level 
of value is obtained. This leads to the Adaptive Stochastic Minimum Cost Coverage problem described 
in ^ namely TT* G arg min^ Cavg(7r) such that /(£^(7r, $) > Q for all $. RecaU that Cavg(7r) is the 
expected cost of tt, which in the unit cost case equals the expected number of items selected by tt, i.e., 
Cavg(7r) :— E [|i?(7r, $)|]. If the objective is adaptive monotone submodular, this is an adaptive version of 
Minimum Submodular Cover (described on line ^ in ^3.1 1, for which the greedy algorithm is known to 



give a (ln(Q) + 1) -approximation for integer-valued monotone submodular functions (Wolsey 1982 1. It 



is also related to the Interactive Submodular Set Cover problem studied by [Guillory and Bilmes ( 2010| l, 



which considers the worst-case setting (i.e., there is no distribution over states; instead states are realized 



in an adversarial manner). Similar results for active learning have been proved by |Kosaraju et al. ]( |1999 i 
and Dasgupta ( 2004[ l, as we discuss in more detail in ^ 



We assume throughout this section that there exists a quality threshold Q such that f{E, $) = Q for all <&, 
and for ail S C E and all $, $) < Q. Note that, as discussed in Section j3j if we replace f{S, <i>) by a 
new function g{S, $) = min(/(S', $), Q') for some constant Q', g will be adaptive submodular if / is. Thus, 
if f{E, $) varies across realizations, we can instead use the greedy algorithm on the function truncated at 
some threshold Q' < min<i, f{E, $) achievable by all realizations. 

In contrast to Adaptive Stochastic Maximization, for the coverage problem additional subtleties arise. In 
particular, it is not enough that a policy tt achieves value Q for the true realization; in order for tt to terminate, 
it also requires a proof of this fact. Formally, we require that vr covers f: 

Definition 7 (Coverage) Let 'J = 5'(7r, $) be the partial realization encoding all states observed during the 
execution ofir under true realization 4>. Given f : 2^ x — !■ M, we say a policy w covers $ with respect 
to / if f {dom{'^) , $') = f{E, ^')for all $' ^ 5". We say that tt covers / if it covers every realization with 
respect to f. 
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Coverage is defined in such a way that upon terminating, tt might not know which realization in 
{$' : $' ^ ^(tt, $)} is the true one, but has guaranteed that it has achieved the maximum reward in ev- 
ery possible case. We obtain results for both the average and worst-case cost objectives. 

Minimizing the Average Cost. Before presenting our approximation guarantee for the Adaptive Stochastic 
Minimum Cost Coverage, we introduce a special class of instances, called self-certifying instances. We 
make this distinction because the greedy policy has stronger performance guarantees for self-certifying 
instances, and such instances arise naturally in applications. For example, the Stochastic Submodular Cover 
and Stochastic Set Cover instances in ^ the Adaptive Viral Marketing instances in ^ and the Pool-Based 
Active Learning instances in ^are all self-certifying. 

Definition 8 (Self-Certifying Instances) An instance of Adaptive Stochastic Minimum Cost Coverage is self- 
certifying if whenever a policy achieves the maximum possible value for the true realization it immediately 
has a proof of this fact. Formally, an instance (/, P [<!>]) is self-certifying if for all <!>, and ^' such that 
$ vjf and <&' we have /(dom(*), $) = J{E, $) if and only //■/(dom(^'), $') = J{E, $'). 

For minimum cost coverage, we also need a stronger monotonicity condition: 

Definition 9 (Strong Adaptive Monotonicity) A function / : 2^ x — > M is strongly adaptive monotone 
with respect to P [•] if informally "selecting more items never hurts" with respect to the expected reward. 
Formally, for all ^, all e ^ dom(^), and all o ^ O such that P [$(e) — o | <!> ^P] > 0, we require 

E[/(dom(*),$) I $-^'] <E[/(dom(*)U{e},$) | $ - $(e) = o] . (14) 

Strong adaptive monotonicity implies adaptive monotonicity, as the latter means that "selecting more 
items never hurts in expectation," i.e., E [/(dom(^'), $) | $ ^ ^p] < E [/(dom(\I/) U {e} , $) | $ 
We now state our main result for the average case cost Cave(7r): 

Tfieorem 10 Suppose / : 2^ x — > M>o is adaptive submodular and strongly adaptive monotone 
with respect to P [$] and there exists Q such that f{E, $) ~ Q for all <!>. Let rj be any value such that 
f{S, ^) > Q ~ rj implies f{S, $) — Qfor all S and $. Let S — min$ P [$] be the minimum probability of 
any realization. Let tt*,,^ be an optimal policy minimizing the expected number of items selected to guarantee 
every realization is covered. Let tt be an a-approximate greedy policy. Then in general 

CavgiTT) < a Ca,.g{T:*,g) (in ( ^ ) + 1 



and for self-certifying instances 



Srj 



,(7r) < a Ca,.g{Tr*,g) ( In ( ^ ) + 1 



Note that //"range(/) C Z, then rj ~ 1 is a valid choice, so in this case Cavgi'n) < a Cavg(7r*,,^) {\n{Q / 5) + 1) 
and Cavg{T^) < a Cavg{T^a^,g) (ln(Q) + 1) for general and self-certifying instances, respectively. 

Minimizing tlie Worst-Case Cost. For the worst-case cost Cwc(7r) := max$ \E{ti, $)|, strong adaptive 
monotonicity is not required; adaptive monotonicity suffices. We obtain the following result. 

Tfieorem 11 Suppose / : 2^ x — > M>o is adaptive monotone and adaptive submodular with respect to 
P [$], and let rj be any value such that f{S, $) > f{E, <&) - ?7 implies f{S, $) = f{E, ^)for all S and 
Let 6 = min$ P [<!>] be the minimum probability of any realization. Let tt^^ be the optimal policy minimizing 
the worst-case number of queries to guarantee every realization is covered. Let tt be an a-approximate greedy 
policy. Finally, let Q := E [/(i?, <&)] be the maximum possible expected reward. Then 



11 



The proofs of Theorems [T0| and |TT| are given in Appendix |A.4| 

Thus, even though adaptive submodularity is defined w.r.t. a particular distribution, perhaps surprisingly, 
the adaptive greedy algorithm is competitive even in the case of adversarially chosen realizations, against a 



policy optimized to minimize the worst-case cost. Theorem 1 1 therefore suggests that if we do not have a 
strong prior, we can obtain the strongest guarantees if we choose a distribution that is "as uniform as possible" 
(i.e., maximizes S) while still guaranteeing adaptive submodularity. 



Discussion. Note that the approximation factor for self-certifying instances in Theorem 10 reduces to the 
(ln((5) + 1) -approximation guarantee for the greedy algorithm for Set Cover instances with Q elements, 
in the case of a deterministic distribution P [$]. Moreover, with a deterministic distribution P [$] there is 
no distinction between average-case and worst-case cost. Hence, an immediate corollary of the result of 
Feige ( 1998 1 mentioned in ^is that for every constant e > there is no polynomial time (1 — e) In (Q/rj) 



approximation algorithm for self-certifying instances of Adaptive Stochastic Min Cost Cover, under either the 
Cavg(') orthec^vc(-) objective, unless NP C DTIME(n'^('°s'°s")). It remains open to determine whether or not 
Adaptive Stochastic Min Cost Cover with the worst-case cost objective admits a In (Q/rj) + 1 approximation 
for self-certifying instances via a polynomial time algorithm, and in particular whether the greedy policy has 
such an approximation guarantee. However, in Lemma 36 we show that Feige's result also implies there is 
no (1 — e) In {Q/Srj) polynomial time approximation algorithm for general (non self-certifying) instances 
of Adaptive Stochastic Min Cost Cover under either objective, unless NP C DTIME(n^('°siogn)-) jjj (hat 



sense, each of the three results comprising Theorem 10 and Theorem ,11, are best-possible under reasonable 



complexity-theoretic assumptions. As we show in Section|9] our result for the average-case cost of greedy 
policies for self-certifying instances also matches (up to constant factors) results on hardness of approximating 
the optimal policy in the special case of active learning, also known as the Optimal Decision Tree problem. 

5.3 The Min-Sum Cover Objective 

Yet another natural objective is the min-sum objective, in which an unrealized reward of x incurs a cost of x in 
each time step, and the goal is to minimize the total cost incurred. 

Background on the Non-adaptive Min-Sum Cover Problem. In the non-adaptive setting, perhaps the 



simplest form of a coverage problem with this objective is the Min-Sum Set Cover problem (Feige et al. 2004 1 
in which the input is a set system {U,S), the output is a permutation of the sets {Si, S2, ■ ■ ■ , Sm), and the 
goal is to minimize the sum of element coverage times, where the coverage time of u is the index of the first 
set that contains it (e.g., it is j if u G Sj and u ^ Si for all i < j). In this problem and its generalizations 
the min-sum objective is useful in modeling processing costs in certain applications, for example in ordering 
diagnostic tests to identify a disease cheaply (Kaplan et al. 2005|l, i n ordering multiple filters to be applied to 



database records while processing a query (Munagala et al. 2005 | l, or in ordering multiple heuristics to run 



on boolean satisfiability instances as a means to solve them faster in practice (Streeter and Golovin 2008 1. 
A particularly expressive generalization of min-sum set cover has been studied under the names Min-Sum 
Submodular Cover ( [Streeter and Golovin| |2008| ) and Li-Submodular Set Cover ( |Golovin et al.|[2008] l. The 
former paper extends the greedy algorithm to a natural online variant of the problem, while the latter studies a 
parameterized family of Lp-Submodular Set Cover problems in which the objective is analogous to minimizing 
the Lp norm of the coverage times for Min-Sum Set Cover instances. In the Min-Sum Submodular Cover 
problem, there is a monotone submodular function f : 2^ ^ IR>o defining the reward obtained from a 
collection of elements^ There is an integral cost c(e) for each element, and the output is a sequence of all of 
the elements a = (ei , 62, . . . , e„) . For each t £ M>o, we define the set of elements in the sequence a within a 
budget of t: 

o-[t] := I St ■■ ^c(ej) < t 



2. To encode Min-Sum Set Cover instance ({/, cS), let E := S and f{A) := \ U^^a ^I, where each e S -B is a subset of elements in U. 
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The cost we wish to minimize is then 



CE(a):=^(/(i?)-/(a[,])) 



(15) 



t=o 



Feige et al. (2004 1 proved that for Min-Sum Set cover, the greedy algorithm achieves a 4- approximation to 



the minimum cost, and also that this is optimal in the sense that no polynomial time algorithm can achieve 
a (4 — e) -approximation, for any e > 0, unless P = NP. Interestingly, the greedy algorithm also achieves a 



4-approximation for the more general Min-Sum Submodular Cover problem as well ( Streeter and Golovin 
120081 IGolovin et al.||2008l ). 



The Adaptive Stochastic Min-Sum Cover Problem. In this article, we extend the result of [Streeter and| 
Golovin| ( |2008 1; Golovin et al. (2008 1 to an adaptive version of Min-Sum Submodular Cover. For clarity's 
sake we will consider the unit-cost case here (i.e., c(e) = 1 for all e); we show how to extend adaptive 
submodularity to handle general costs in the Appendix. In the adaptive version of the problem, 7r[t] plays the 
role of CT[t], and /^vg plays the role of /. The goal is to find a policy tt minimizing 



t=o 



t=0 



We call this problem the Adaptive Stochastic Min-Sum Cover problem. The key difference between this 
objective and the minimum cost cover objective is that here, the cost at each step is only the fractional extent 
that we have not covered the true realization, whereas in the minimum cost cover objective we are charged 
in full in each step until we have completely covered the true realization (according to Definition |7]l. We 
prove the following result for the Adaptive Stochastic Min-Sum Cover problem with arbitrary item costs in 



Appendix A.5 



Theorem 12 Fix any a > 1. If f is adaptive monotone and adaptive submodular with respect to the 
distribution P tt is an a-approximate greedy policy with respect to the item costs, and tt* is any policy, 
then C5](7r) < 4Q;C5](7r*). 



6. Application: Stochastic Submodular Maximization 

As our first application, consider the sensor placement problem introduced in § 1 Suppose we would like to 
monitor a spatial phenomenon such as temperature in a building. We discretize the environment into a set E of 
locations. We would like to pick a subset AQEofk locations that is most "informative", w here we use a set 



function f{A) to quantify the informativeness of placement A. Krause and Guestrin (2007 1 show that many 



natural objective functions (such as reduction in predictive uncertainty measured in terms of Shannon entropy 
with conditionally independent observations) are monotone submodular 

Now consider the problem, where sensors can fail or partially fail (e.g., be subject to some varying amount 
of noise) after deployment. We can model this extension by assigning a state $(e) G O to each possible 
location, indicating the extent to which a sensor placed at location e is working. To quantify the value of a set 
of sensor deployments under a realization $ indicating to what extent the various sensors are working, we first 
define (e, o) for each e ^ E and o ^ O, which represents the placement of a sensor at location e which is in 
state o. We then suppose there is a function / : 2^^'-' — > M>o which quantifies the informativeness of a set 
of sensor deployments in arbitrary states. The utility /(A, <&) of placing sensors at the locations in A under 
realization $ is then 

/(A,*) :=/({(e,$(e)):eeA}). 

We aim to adaptively place k sensors to maximize our expected utility. We assume that sensor failures at 
each location are independent of each other, i.e., P [$] = P [$(e)] , where P [$(e) — a] is the probability 
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that a sensor placed at location e will be in state o. |Asadpour et al.| ( [2()08| l studied a special case of our problem, 
in which sensors either fail completely (in which case they contribute no value at all) or work perfectly, under 
the name Stochastic Submodular Maximization. They proved that the adaptive greedy algorithm obtains a 
(1 — 1/e) approximation to the optimal adaptive policy, provided / is monotone submodular We extend their 
result to multiple types of failures by showing that f{A, $) is adaptive submodular with respect to distribution 
P [$] and then invoking Theorem^ Fig. illustrates an instance of the Stochastic Submodular Maximization 
problem where /(A, <i>) is the cardinality of union of sets index by A and parameterized by <!>. 

Theorem 13 Fix a prior such that P [(f>] — HeeB ^ [^('^)] '^^^ integer k, and let the objective function 
f : 2^^^ —5- M>o be monotone submodular Let it be any a-approximate greedy policy attempting to 
maximize f, and let n* be any policy. Then for all positive integers £, 

In particular, if it is the greedy policy (i.e., a — 1) and £ — k, then favgiT^ik]) ^ (l ^ ^) favgiT^*^.^)- 



Proof We prove Theorem 13 by first proving / is adaptive monotone and adaptive submodular in this 
model, and then applying Theoremjs] Adaptive monotonicity is readily proved after observing that /(•, $) 
is monotone for each $. Moving on to adaptive submodularity, fix any ^1^^ v]/' such that 5* C vj/' and any 
e ^ dom(5''). We aim to show A (e | ^I^') < A (e | '5). Intuitively, this is clear, as A (e | '5') is the expected 
marginal benefit of adding e to a larger base set than is the case with A (e | ^f), namely dom(^'') as compared 
to dom(^), and the realizations are independent. To prove it rigorously, we define a coupled distribution 
p over pairs of realizations $ ^ \I> and $' ^ ^E*' such that $(e') = $'(e') for all e' ^ dom(^''). Formally, 
$') = nee£;\dom(*) IP if and <I>(e') = $'(e') for all e' ^ dom(*'); otherwise 

= 0. (Note that ^(<i>, $') > implies $(e') = $'(e') for all e' £ dom(*) as well, since 
$ ^ $' ^r', and * C m'.) Also note that P [$ | 5-] = J2<s>' and P | J2<j> *')■ 

Calculating A (e | and A (e | using p, we see that for any ($,$') in the support of /i, 

/(dom(vI/')U{e},<i>')-/(dom(vI/'),<f') - U {(e, $'(e))}) - /(*')) 

< /(vI.U{(e,$(e))}) -/(*)) 
= /(dom(5')U{e},$)-/(dom(4'),$) 

from the submodularity of /. Hence 

A(e|*') = l^i^,^') if {dom{^')U{e},^')-fidom{^'), <!>')) 

< j:(^,<,')l^{^:'^')if{doM^)'J{e},'P)-f{domi^),'t>)) = A(e|*) 
which completes the proof. ■ 



7. Application: Stochastic Submodular Coverage 

Suppose that instead of wishing to adaptively place k unreliable sensors to maximize the utility of the 
information obtained, as discussed in ^ we have a quota on utility and wish to adaptively place the minimum 
number of unreliable sensors to achieve this quota. This amounts to a minimum-cost coverage version of 
the Stochastic Submodular Maximization problem introduced in ^ which we call Stochastic Submodular 
Coverage. 

As in ^ in the Stochastic Submodular Coverage problem we suppose there is a function / : 2^^*^ — > ]R>o 
which quantifies the utility of a set of sensors in arbitrary states. Also, the states of each sensor are independent, 
so that P [$] — YleeE ^ i^i^)] - The goal is to obtain a quota Q of utility at minimum cost. Thus, we define 
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Figure 2: Illustration of part of a Stochastic Set Cover instance. Shown are the supports of two distributions 
over sets, indexed by items e (marked in blue) and e' (yellow). 



our objective as f{A, $) := min /({(e, $(e)) : e G ^}) |, and want to find a policy tt covering every 

realization and minimizing Cavg(7r) :— E [|-E(7r, $)|]. We additionally assume that this quota can always be 
obtained using sufficiently many sensor placements; formally, this amounts to f{E, $) = Q for all $. We 
obtain the following result, whose proof we defer until the end of this section. 

Theorem 14 Fix a prior such that P [$] = HeGB [^('^)] ^'^^ objective function f : 2^^^ M>o 

be a monotone submodular function. Fix Q € M>o such that f{A, $) := min ^Q, /({(e, ^(e)) : e £ 

satisfies f{E, $) = Qfijr all <!>. Let rj be any value such that f{S, ^) > Q — rj implies f{S, $) = Qfor all S 
and $. Finally, let tt be an a-approximate greedy policy for maximizing f, and let tt* be any policy. Then 



CavgiT^) < aCavgilT*) |^ln — j + 1 

A Special Case: The Stochastic Set Coverage Problem. The Stochastic Submodular Coverage problem 



is a generalization of the Stochastic Set Coverage problem (Goemans and Vondrak 2006). In Stochastic 
Set Coverage the underlying submodular objective / is the number of elements covered in some input set 
system. In other words, there is a ground set U of n elements to be covered, and items E such that each 
item e is associated with a distribution over subsets of U. When an item is selected, a set is sampled from 
its distribution, as illustrated in Fig. 2, The problem is to adaptively select items until all elements of U are 



covered by sampled sets, while minimizing the expected number of items selected. Like us, Goemans and 



Vondrak also assume that the subsets are sampled independently for each item, and every element of U can be 



covered in every realization, so that f{E, $) = |C/| for all $. 



Goemans and Vondrak primarily investigated the adaptivity gap of Stochastic Set Coverage, for variants 
in which items can be repeatedly selected or not, and prove adaptivity gaps of 8 (log n) in the former case, 
and between ^}{n) and ©(n^) in the latter. They also provide an n-approximation algorithm. More recently, 
|Liu et al.] ( |2008| considered a special case of Stochastic Set Coverage in which each item may be in one of 
two states. They were motivated by a streaming database problem, in which a collection of queries sharing 
common filters must all be evaluated on a stream element. They transform the problem to a Stochastic Set 
Coverage instance in which (filter, query) pairs are to be covered by filter evaluations; which pairs are covered 
by a filter depends on the (binary) outcome of evaluating it on the stream element. The resulting instances 
satisfy the assumption that every element of U can be covered in every realization. They study, among other 
algorithms, the adaptive greedy algorithm specialized to this setting, and show that if the subsets are sampled 
independently for each item, so that P [$] = Yle ^ ['i'(e)], then it is an Tin '■— X]"=i ]c approximation. (Recall 



ln(n) < T-Ln < In(ri) + 1 for all n > 1.) Moreover, Liu et al. report that it empirically outperforms a number 
of other algorithms in their experiments. 
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The adaptive submodularity framework allows us to recover|Liu et al.[s result, and generalize it to richer 



item distributions over subsets of U, all as a corollary of Theorem 14 Specifically, we obtain a (ln(7i) + 1) 
approximation for the Stochastic Set Coverage problem, where n := \U\, which matches the approximation 
ratio for the greedy algorithm for classical Set Cover that Stochastic Set C overag e generalizes. Like |Liu et al 
result, our result is tight if NP ^ DTIME(n'^'^'°s ") )^ since it matches Feige s lower bound of (1 — e) In n 



for the approximability of Set Cover under that assumption ( |Feige| [1998 i 



s 



We model the Stochastic Set Coverage problem by letting ^(e) CU indicate the random set sampled from 
e's distribution. Since the sampled sets are independent we have P [$] = Yle ^ ['^'('s)]- For any A C E let 
f{A, $) :— I UegA *&(e)| be the number of elements of U covered by the sets sampled from items in A. As in 
the previous work mentioned above, we assume f{E, ^) — n for all $. Therefore we may set Q ~ n. Since 
the range of / includes only integers, we may set rj — 1. Applying Theorem 14 then yields the following 
result. 

Corollary 15 The adaptive greedy algorithm achieves a (ln(n) + l)-approximation for Stochastic Set Cover- 
age, where n :— \U\ is the size of the ground set. 



We now provide the proof of Theorem 14 



Proof of Theorem 14^ We will ultimately prove Theorem 'H'by applying the bound from Theorem fTOl for 
self-certifying instances. The proof mostly consists of justifying this final step. Without loss of generality we 

may assume / is truncated at Q, otherwise we may use g(S) = min |(5, f{S)^ in lieu of /. This removes 



the need to truncate /. Since we established the adaptive submodularity of / in the proof of Theorem 13 and 
by assumption f{E, ^) = Q for all $, to apply Theorem^lOjwe need only show that / is strongly adaptive 
monotone, and that the instances under consideration are self-certifying. 

We begin by showing the strong adaptive monotonicity of /. Fix a partial realization "if, an item e ^ 
dom(5') and a state a. Let 5*' = 5* U {(e, o)}. Then treating and as subsets of £' x O, and using the 
monotonicity of /, we obtain 

E [/(dom(«'), I $ - ^-J = f{^) < f{^') < E [/(dom(*'), $) | $ - , 

which is equivalent to the strong adaptive monotonicity condition. 

Next we prove that these instances are self-certifying. Consider any ^I^ and <!>, $' consistent with Then 

/(dom(*),$) = /(*) = /(dom(^'),$')- 

Since = f{E,^') = Qby assumption, it follows that /(dom(*), $) = /(i?, $) iff /(dom(\l'), <!>') = 
f{E, $'), so the instance is self-certifying. 



We have shown that / and P [$] satisfy the assumptions of Theorem 10 on this self-certifying instance. 
Hence we may apply it to obtain the claimed approximation guarantee. ■ 



8. Application: Adaptive Viral Marketing 

For our next application, consider the following scenario. Suppose we would like to generate demand for a 
genuinely novel product. Potential customers do not realize how valuable the new product will be to them, and 
conventional advertisements are failing to convince them to try it. In this case, we may try to spur demand by 
offering a special promotional deal to a select few people, and hope that demand builds virally, propagating 
through the social network as people recommend the product to their friends and associates. Supposing we 
know something about the structure of the social networks people inhabit, and how ideas, innovation, and new 
product adoption diffuse through them, this begs the question: to which initial set of people should we offer 
the promotional deal, in order to spur maximum demand for our product? 

This, broadly, is the viral marketing problem. The same problem arises in the context of spreading 
technological, cultural, and intellectual innovations, broadly construed. In the interest of unified terminology 
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Figure 3: Illustration of the Adaptive Viral Marketing problem. Left: the underlying social network. Middle: 
the people influenced and the observations obtained after one person is selected. 



we follow TCempeetaL ( 2003| l and talk of spreading influence through the social network, where we say people 



are active if they have adopted the idea or innovation in question, and inactive otherwise, and that a influences 
6 if a convinces h to adopt the idea or innovation in question. 

There are many ways to model the diffusion dynamics governing the spread of influence in a social network. 
We consider a basic and well-studied model, the independent cascade model, described in detail below. For 



this model Kempe et al. obtain a very interesting result; they show that the eventual spread of the influence / 
(i.e., the ultimate number of customers that demand the product) is a monotone submodular function of the 
seed set S of people initially selected. This, in conjunction with the results of |Nemhauser et al.| ( |1978j ) implies 
that the greedy algorithm obtains at least (l — of the value of the best feasible seed set of size at most k, 
i.e., argmax5.|5|<j, f{S), where we interpret k as the budget for the promotional campaign. Though 'Kempej 



et al. consider only the maximum coverage version of the viral marketing problem, their result in conjunction 
with that of |Wolsey| ( |1982| l also implies that the greedy algorithm will obtain a quota Q of value at a cost of 
at most ln((5) + 1 times the cost of the optimal set argmiug {c{S) : f{S) > Q} if / takes on only integral 
values. 

Adaptive Viral Marketing. The viral marketing problem has a natural adaptive analog. Instead of selecting 
a fixed set of people in advance, we may select a person to offer the promotion to, make some observations 
about the resulting spread of demand for our product, and repeat. See Fig. 3 for an illustration. In SS.l' we use 
the idea of adaptive submodularity to obtain results analogous to those of Kempe et al.,(2003p in the adaptive 
setting. Specifically, we show that the greedy policy obtains at least (l — ^] of the value of the best policy. 
Moreover, we extend this result by achieving that guarantee not only for the case where our reward is simply 
the number of influenced people, but also for any (nonnegative) monotone submodular function of the set of 



people influenced. In i 8.2 we consider the minimum cost cover objective, and show that the greedy policy 



obtains a logarithmic approximation for it. 

Independent Cascade Model. In this model, the social network is a directed graph G — {V, A) where each 
vertex in y is a person, and each edge (u, v) E A has an associated binary random variable X^v indicating if 
u will influence v. That is, Xuv = 1 if u will influence v once it has been influenced, and X^v — otherwise. 
The random variables X^v are independent, and have known means puv ■= E We will call an edge 

{u, v) with Xuv = 1 a live edge and an edge with Xuv = a dead edge. When a node u is activated, the edges 
Xuv to each neighbor vofu are sampled, and v is activated if {u, v) is live. Influence can then spread from 
It's neighbors to their neighbors, and so on, according to the same process. Once active, nodes remain active 
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throughout the process, however [Kempe et al.| ( |2003] l show that this assumption is without loss of generality, 
and can be removed. 

The Feedback Model. In the Adaptive Viral Marketing problem under the independent cascades model, the 
items correspond to people we can activate by offering them the promotional deal. How we define the states 
depends on what information we obtain as a result of activating u. Given the nature of the diffusion 
process, activating u can have wide-ranging effects, so the state has more to do with the state of the social 
network on the whole than with u in particular Specifically, we model $(w) as a function : yl — > {0, 1, ?}, 
where $„((m, v)) — means that activating u has revealed that (u, v) is dead, <i>„((w, v)) — 1 means that 
activating u has revealed that (u, v) is live, and v)) =1 means that activating u has not revealed the 

status of (u, v) (i.e., the value of Xuv)- We require each realization to be consistent and complete. Consistency 
means that no edge should be declared both live and dead by any two states. That is, for all u,v and 
a e A, (<i>,i(a), $i,(a)) ^ {(0, 1), (1, 0)}. Completeness means that the status of each edge is revealed by 
some activation. That is, for all a G A there exists u E V such that <i>u(a) € {0, 1}. A consistent and complete 
realization thus encodes Xuv for each edge {u, v). Let A{^) denote the live edges as encoded by $. There are 
several candidates for which edge sets we are allowed to observe when activating a node u. Here we consider 
what we call the Full- Adoption Feedback Model: After activating u we get to see the status (live or dead) of all 
edges exiting v, for all nodes v reachable from u via live edges (i.e., reachable from u in (V, A{^)), where $ 
is the true realization. We illustrate the full-adoption feedback model in Fig. [3] 

The Objective Function. In the simplest case, the reward for influencing a set U C V of nodes is 
f{U) := \U\. Kempe et al.^ ( ^2003^ 1 obtain an (l — -approximation for the slightly more general case 
in which each node u has a weight Wu indicating its importance, and the reward is /([/) J2ueu ^® 
generalize this result further, to include arbitrary nonnegative monotone submodular reward functions /. This 
allows us, for example, to encode a value associated with the diversity of the set of nodes influenced, such as 
the notion that it is better to achieve 20% market penetration in five different (equally important) demographic 
segments than 100% market penetration in one and 0% in the others. 



8.1 The Maximum Coverage Objective 

We are now ready to formally state our result for the maximum coverage objective. 

Theorem 16 The greedy policy n^'''^'"'^' obtains at least (l — of the value of the best policy /or the Adaptive 
Viral Marketing problem with arbitrary monotone submodular reward functions, in the independent cascade 
and full-adoption feedback models discussed above. That is, if (j{S, $) is the set of all activated nodes when S 
is the seed set of activated nodes and $ is the realization, / : 2^ — ^ K>o is an arbitrary monotone submodular 
function indicating the reward for influencing a set, and the objective function is f{S, $) := f{a{S, $)), then 
for all policies tt and all k Cz N we have 

More generally, if it is an a-approximate greedy policy then for all G N, favgiT^ii] 

Proof Adaptive monotonicity follows immediately from the fact that /(•, $) is monotonic for each It thus 
suffices to prove that / is adaptive submodular with respect to the probability distribution on realizations P [$], 
because then we can invoke Theorem |5j to complete the proof. 

We will say we have observed an edge (it, v) if we know its status, i.e., if it is live or dead. Fix any 4', 
such that vl/ C v]/' and any v ^ dom(^''). We must show A {v \ 5*') < A (u | ^t). To prove this rigorously, we 
define a coupled distribution p, over pairs of realizations $ ^ vj/ and $' ^ ^E*'. Note that given the feedback 
model, the realization <I> is a function of the random variables {X^w '■ {u, w) E A} indicating the status of 
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each edge. For conciseness we use the notation X = {Xuw ■ {u, w) £ A}. We define /x implicitly in terms 
of a joint distribution /t on X x X', where $ = $(X) and $' ~ ^'(X') are the realizations induced by the 
two distinct sets of random edge statuses, respectively. Hence /i($(X), $(X')) — /i(X, X'). Next, let us say 
a partial realization observes an edge e if some w E dom(^) has revealed its status as being live or dead. 
For edges {u, w) observed by the random variable X^w is deterministically set to the status observed by 
^t. Similarly, for edges {u, w) observed by 5*', the random variable X'^^ is deterministically set to the status 
observed by 5*'. Note that since C vp', the state of all edges which are observed by \1/ are the same in $ 
and All (X, X') e support(/i) have these properties. Additionally, we will construct /t so that the status 
of all edges which are unobserved by both and ^I^ are the same in X and X', meaning Xuw — X'^^, for 
all such edges (u, w), or else /i(X, X') = 0. 

The above constraints leave us with the following degrees of freedom: we may select Xuw for all 
{u,w) € A which are unobserved by ^P. We select them independently, such that E [Xuw] = Puw as with 
P [$]. Hence for all (X, X') satisfying the above constraints, 

/i(X, X') = Y[ Puw"' (1 - Puw)^^^"" , 

{u,w) unobserved by 

and otherwise /i(X, X') = 0. Note that P [$ | = J2^, ^($, $') and P [$' | ^-'J = $')■ We next 

claim that for all ($, $') e support(/i) 

/(dom(*') U {«},$')- /(dom(*'),^'') < /(dom(*) U {«},$)- /(dom(^'), $) (17) 

Recall f{S, $) := f{a{S, $)), where a{S, <i>) is the set of all activated nodes when S is the seed set of 
activated nodes and $ is the realization. Let B — a{dom{^), <i>) and C = cr(dom(\l/) U {v} , <i>) denote 
the active nodes before and after selecting v after dom(^) under realizations <I>, and similarly define B' 



and C with respect to vp' and Let D := C \ B, D' := C \ B' . Then Eq. (17i is equivalent to 
j{B' U D') - f{B') < f{B U L») - f{B). By the submodularity of /, it suffices to show that B C B' and 
D' C D to prove the above inequality, which we will now do. 

We start by proving B C B'. Fix w E B. Then there exists a path from some u E dom(^') to w in 
( V, A{^)). Moreover, every edge in this path is not only live but also observed to be live, by definition of the 
feedback model. Since ($, $') E support(/x), this implies that every edge in this path is also live under as 
edges observed by must have the same status under both $ and It follows that there is a path from u to 
w in (y, Since u is clearly also in dom(^'), we conclude w E B', hence B C B'. 

Next we show D' C D. Fix some w E D' and suppose by way of contradiction that w ^ D. Hence there 
exists a path P from w to w in (V, A{^')) but no such path exists in (V, A{^)). The edges of P are all live 
under and at least one must be dead under $. Let (m, u') be such an edge in P. Because the status of 
this edge differs in $ and and ($, <!>') E support(/i), it must be that {u, u') is observed by but not 
observed by Because it is observed by in our feedback model it must be that u is active after dom(^') 
is selected, i.e., u E B'. However, this implies that all nodes reachable from u via edges in P are also active 
after dom(5'') is selected, since all the edges in P are live. Hence all such nodes, including w, are in B'. 
Since D' and B' are disjoint, this implies w ^ D', a contradiction. 

Having proved Eq. {Vl} , we now proceed to use it to show A{v\'9') < A | ^t) as in ^ 

A = E(*.$oA^('J'''^'')(/(dom(vI/')UM,$')-/(dom(*'),<i>')) 

which completes the proof. ■ 



Comparison with Stochastic Submodular Maximization. It is worth contrasting the Adaptive Viral Mar- 
keting problem with the Stochastic Submodular Maximization problem of ^ In the latter problem, we can 
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think of the items as being random independently distributed sets. In Adaptive Viral Marketing by contrast, the 
random sets (of nodes influenced when a fixed node is selected) depend on the random status of the edges, and 
hence may be correlated through them. Nevertheless, we can obtain the same (l — i) approximation factor 
for both problems. 



A Comment on the Myopic Feedback Model. In the conference version of this article ( Golovin and Krause 



2010| i, we considered an alternate feedback model called the myopic feedback model, in which after activating 



V we see the status of all edges exiting v in the social network, i.e., := {{u, v) : v (z V} D A. We 

claimed that the objective / as defined previously is adaptive submodular in the independent cascade model 
with myopic feedback, and hence the greedy policy obtains a (1 — ^) approximation for it. We hereby retract 
this claim, and furthermore give a counterexample demonstrating that / is not adaptive submodular under 
myopic feedback. 

Consider a graph G ~ {V,E) with vertices V :— and edges E {{u,v), {v,w)}. The 

edge parameters are p^v — 1 and py^^ = 1 — e. Let f{U) = \U\ and construct / from / accordingly. 
We let ^ = {(«,<!>„)}, where $„((w,w)) = 1 and w)) = ?. Let = {(u, $„), (u, $„)} where 

w)) — 0. Clearly, 5* C 5*'. Note A(w | ^P) = e, since the marginal benefit of w over dom(^) is one 
if {v, w) is dead, and zero if it is live, and the former occurs with probability e. In contrast, A {w \ ^P') — 1, 
since ^P' contains the observation that {v, w) is dead. Hence A (w | ^P) < A{w \ 'i''), which violates adaptive 
submodularity. However, we conjecture that the greedy policy still obtains a constant factor approximation 
even in the myopic feedback model. 

8.2 The Mmunum Cost Cover Objective 

For the minimum cost cover objective, we have an instance of Adaptive Stochastic Minimum Cost Cover, 
in which we are given a quota Q < f{V) and we must adaptively select nodes to activate until the set of all 
active nodes S satisfies f{S) > Q. We obtain the following result. 

Theorem 17 Fix a monotone submodular function f : 2^ ^ M>o indicating the reward for influencing a set, 
and a quota Q < f{V). Suppose the objective function is f{S, $) :— min |q, f{a{S, ^))^, where (t{S, $) 
is the set of all activated nodes when S is the seed set of activated nodes and $ is the realization. Let rj be 
any value such that f{S) > Q — rj implies f{S) > Q for all S. Then any a-approximate greedy policy tt on 

average costs at most a ^In + 1^ times the average cost of the best policy obtaining Q reward for the 
Adaptive Viral Marketing problem in the independent cascade model with full-adoption feedback as described 
above. That is, Ca„g(7r) < a (in (j^^ + 1^ Ca,,s,(7r*) /or any tt* that covers every realization. 

Proof We prove Theorem [l7]by recourse to Theorem [lOj We have already established that / is adaptive 



submodular, in the proof of Theorem 16 It remains to show that / is strongly adaptive monotone, that these 



instances are self-certifying, and that Q and ry equal the corresponding terms in the statement of Theorem 10 
We start with strong adaptive monotonicity. Fix 'i>, e ^ dom{'i'), and a £ O. We must show 

E[/(dom(*),$) I <E[/(dom(*)U{e},$) | $-*,$(e)=o]. (18) 

Let V~^{'^) denote the active nodes after selecting dom(^') and observing 4*. By definition of the full 
adoption feedback model, consists of precisely those nodes v for which there exists a path Puv from 

some u e dom(\I') to v via exclusively live edges. The edges whose status we observe consist of all edges 
exiting nodes in V~^{'i/). It follows that every path from any u G V^{'i') to any v £ V \ V^{'i) contains 
at least one edge which is observed by 'i' to be dead. Hence, in every $ ^ the set of nodes activated 
by selecting dom(^I') is the same. Therefore E [/(dom(vI/), $) | $ ^ vp] f{V+{^)). Similai'ly, if we 
define := U {(e, o)}, then E [/(dom(*) U {e} , $) | $ - 5',$(e) = o] = /(!/+(«'')). Note that once 
activated, nodes never become inactive. Hence, C vj/' implies T^^(vEOC !/+('!''). Since / is monotone by 
assumption, this means < which implies Eq. JlSl) and strong adaptive monotonicity. 
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Next we establish that these instances are self-certifying. Note that for every $ we have f{V, $) = 

min |q, — Q. From our earlier remarks, we know that /(dom(^'),$) — f{V^{'^)) for every 

$ ~ ^ff. Hence for all and $, $' consistent with we have /(dom('I'), $) /(dom(vI/), and so 
/(dom(5'), $) = Q if and only if /(dom(5'), $') = Q, which proves that the instance is self-certifying. 

Finally we show that Q and rj equal the corresponding terms in the statement of Theorem] 10[ As noted 
earlier, f{V, $) = Q for all We defined rj as some value such that f{S) > Q — rj implies f{S) > Q for all 

S. Since range(/) = |min |(5, fiS)^ : S C it follows that we cannot have f{S, $) G (Q - t], Q) for 
any S and so that 77 satisfies the requirements of the corresponding term in Theorem Hence we may 
apply Theorem 10 on this self-certifying instance with Q and 77 to obtain the claimed result. ■ 



9. Application: Active Learning 



Obtaining labeled data to train a classifier is typically expensive, as it often involves asking an expert. In 



active learning (c./, Cohn et al. (1996i, McCallum and Nigam (1998i), the key idea is that some labels 
are more informative than others: labeling a few unlabeled examples can imply the labels of many other 
unlabeled examples, and thus the cost of obtaining the labels from an expert can be avoided. As is standard, 
we assume that we are given a set of hypotheses H, and a set of unlabeled data points X where each a; e X is 
independently drawn from some distribution T). Let L be the set of possible labels. Classical learning theory 
yields probably approximately correct (PAC) bounds, bounding the number n of examples drawn i.i.d. from T) 
needed to output a hypothesis h that will have expected error at most e with probability at least 1 — (5, for some 
fixed e, (5 > 0. That is, if h* is the target hypothesis (with zero error), and error(/i) :— V^r^v [h{x) ^ h*{x)] 
is the error of h, we require P [error(/i) < e] > 1 — ^. The latter probability is taken with respect to ^{X); 
the learned hypothesis h and thus error(ft,) depend on it. 

A key challenge in active learning is to avoid bias: actively selected examples are no longer i.i.d., and thus 
sample complexity bounds for passive learning no longer apply. If one is not careful, active learning may 
require more samples than passive learning to achieve the same generalization error One natural approach to 
active learning that is guaranteed to perform at least as well as passive learning is pool-based active learning 
( McCallum and Nigam||199^ : The idea is to draw n unlabeled examples i.i.d. However, instead of obtaining 
all labels, labels are adaptively requested until the labels of all unlabeled examples are implied by the obtained 
labels. Now we have obtained n labeled examples drawn i.i.d., and classical PAC bounds still apply. The key 
question is how to request the labels for the pool to infer the remaining labels as quickly as possible. 

In the case of binary labels L = { — 1, 1}, various authors have considered greedy policies which generalize 
binary search ( Garey and Gra hamj [l 974[ |Loveland 1985 Arkin et al. 1993 Kosaraju et al. 1999 Dasgupta 



|2004[ [Guillory and Bilmes||2009[ |Nowak||2009 i. The simplest of these, called generalized binary search 



(GBS) or the splitting algorithm, works as follows. Define the version space V to be the set of hypotheses 
consistent with the observed labels (here we assume that there is no label noise). In the worst-case setting, 
GBS selects a query x ^ X that minimizes In the Bayesian setting we assume we are given 

apriorpif over hypotheses; in this case GBS selects a query x ^ X that minimizes \'^h(zv PH{h) ■ h{x)\. 
Intuitively these policies myopically attempt to shrink a measure of the version space (i.e., the cardinality 
or the probability mass) as quickly as possible. The former provides an 0(log |iJ|)-approximation for the 
worst-case number of queries Arkin et al.^^l993|), and the latter provides an ^^(log minh \h (h) ) -approximation 



for the expected number of queries ( [Kosaraju et al.[|1999l|Dasgupta| 2004|l and a natural generalization of 
GBS obtains the same guarantees with a larger set of labels guillory and Bilmes 2009 1. Kosaraju et al. also 
prove that running GBS on a modified prior p'^^^ (ft,) cx max |^ log |} is sufficient to obtain an 

O(log |iJ|)-approximation. 

Viewed from this perspective of the previous sections, shrinking the version space amounts to "covering" 
all false hypotheses with stochastic sets (i.e., queries), where query x covers all hypotheses that disagree with 
the target hypothesis h* at x. That is, x covers {h : h{x) ^ h*{x)}. As in these sets may be correlated in 
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Figure 4: Illustration of the Active Learning problem, in the simple special case of one-dimensional data and 
binary threshold hypotheses H = {hr : r G M}, where hrix) = 1 if x > t and otherwise. 



complex ways determined by the set of possible hypotheses. As we will show, the reduction in version space 
mass is adaptive submodular, and this allows us to obtain a new analysis of GBS using adaptive submodularity, 
which is arguably more amenable to extensions and generalizations than previous analyses. Our new analysis 
furthe r allows us to improve on the previous best bound on the approximation factor of GBS ( jPasgupta] 



2004 1 from 4 In ( —. — - — 77:^ I to In ( —. — - — rn: I + 1- We also show that when we apply GBS to a modified 

ymnih PH{h) J ymirih PH{h) J t'f J 



prior distribution, the approximation factor is improved to ©(In |). This result matches a lower bound of 



ri(ln |i/|) of Chakaravarthy et al. (2007 1 up to constant factors 



Theorem 18 In the Bayesian setting in which there is a prior pn on a finite set of hypotheses H, the 
generalized binary search algorithm makes OPT- (in (^ yainh^nih) } ^ ^) 1'^'^'^^^ expectation to identify a 
hypothesis drawn from pn, where OPT is the minimum expected number of queries made by any policy. If 
mh\hPH{h) is sufficiently small, running the algorithm on a modified prior p'j^{h) cx max (/i), 
improves the approximation factor to 0(ln |). 

Proof We first address the important special case of a uniform prior over hypotheses, i.e., pH{h) = ^/\H\ 
for all h G H, and then we reduce the case with a general prior to a uniform prior. We wish to appeal to 
Theorem fTO' so we convert the problem into an Adaptive Stochastic Min Cost Cover problem. Define a 
realization for each hypothesis h G H. The ground set E — X, and the outcomes are binary; we 
define O = { — 1, 1} instead of using {0, 1} to be consistent with our earlier exposition. For all /i e iJ we 
set = h, meaning = h{x) for all a; G X. To define the objective function, we first need some 

notation. Given observed labels ^ <Z X x 0,\&t V{^) denote the version space, i.e., the set of hypotheses for 
which h{x) = '^{x) for all x G doni(vl'). See Fig.[4|for an illustration of an active learning problem in the 
case of indicator hypotheses. For a set of hypotheses V, let ph{V) :— J2hev PhW denote their total prior 
probability. Finally, let '5(5, h) — {{x, h{x)) : x G 5} be the function with domain S that agrees with h on 
S. We define the objective function by 

:= l-pH{V{^{S,h))) = pH{{h' : 3x E S, h' (x) ^ h{x)}) 

and use P [^h] = pnih) — l^/\H\ for all h. Let tt* be an optimal policy for this Adaptive Stochastic Min Cost 
Cover instance. Note that there is an exact correspondence between policies for the original problem of finding 
the target hypothesis and our problem of covering the true realization; identifying h* as the target hypothesis 
corresponds to covering ^h-- Hence Cavg(7r*) = OPT. Note that because we have assumed a uniform prior 
over hypotheses, we have f{X, = 1 — 1/\H\ for all h. Also, maximizing the conditional expected 
reward forces the policy to identify h* and hence and this ensures that these instances are self-certifying. 
More formally, these instances are self-certifying because for any and such that ^ we have 
that /(dom(\l/), = f{X, "I>/i) implies V{'^) = {h}. This in turn means that is the only realization 
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consistent with ^I^, which trivially implies that any realization $' ^ v[/ also has /(dom(4'), $') = f{X, $'); 
hence the instance is self-certifying. 

We next argue that / is adaptive submodular and strongly adaptive monotone with respect to P [$]. Each 
query x eliminates some subset of hypotheses, and as more queries are performed, the subset of hypotheses 
eliminated by x cannot grow. More formally, consider the expected marginal contribution of x under two 
partial realizations ^f, where 5* is a subrealization of (i.e., ^E* C and x ^ dom(^''). Let ^'[x/o] be 
the partial realization with domain dom(^') U {x} that agrees with ^ on its domain, and maps x to o. For each 
o G O, let flo := pH{V{'i'[x/o])), bo :~ pH{V{'i>'[x/o])). Since a hypothesis eliminated from the version 
space cannot later appear in the version space, we have Uo > bo for all o. Next, note the expected reduction in 
version space mass (and hence the expected marginal contribution) due to selecting x given partial realization 
* is 



oeo 



The corresponding quantity for 5'' has bo substituted for Co in Eq. (19 1, for each o. To prove adaptive 
submodularity we must show A (a; | ^I*) > A (x | 4'') and to do so it suffices to show that d4>/dzo > for each 

o and z g {c e [0, 1]*^ : Ylio '-o ^ where 4>{z) := (^Eo^o' ^oZo'^ / (Eo' -^o') has the same functional 
form as the expression for A (x | 4') in Eq. ( [T9] ). This is because d4>/dzo > for each o implies that growing 
the version space in any manner cannot decrease the expected marginal benefit of query x, and hence shrinking 
it in any manner cannot increase the expected marginal benefit of x. It is indeed the case that d^ijdzo > for 
each o. More specifically, it holds that 

d(j) _ Eb^^a + E(f,,c):b^c,6^o,C5^a ^^^c ^ ^ 

^~ (Eb^^.)' " ' 

which can be derived through elementary calculus. 

Demonstrating strong adaptive monotonicity amounts to proving that adding labels cannot grow the version 



space, which is clear in our model. Hence we can apply Theorem 10 to this self-certifying instance with 
maximum reward threshold Q ~ 1 — l/|i?|, and minimum gap r\ — l/|i?|, to obtain an upper bound of 
OPT (In — 1) + 1) on the number of queries made by the generalized binary search algorithm (which 
corresponds exactly to the greedy policy for Adaptive Stochastic Min Cost Cover) under the assumption of a 
uniform prior over H. 

Now consider general priors over iJ. We construct the Adaptive Stochastic Min Cost Cover instance as 
before, only we change the objective function to 

The modified objective is still adaptive submodular, because (S*, i— > pnih) is clearly so, and because 
adaptive submodularity is defined via linear inequalities it is preserved under taking nonnegative linear 
combinations. Note that f{X, = 1 for all <i>/j. Showing / is strongly adaptive monotone requires slightly 
more work than before. Fix 'i',x ^ dom(5'), and o ^ O. We must show E [/(dom(5'), $) | $ ^ v|/] < 
E [/(dom(\I>) U {x} , $) I <i> ^ $(a;) = o]. Plugging in the definition of /, the inequality we wish to 
prove may be simplified to 

E [pHih) I ^ - E [pH{h) I ^ *[.x/o]] < PHiV{^)) - pHiVmx/o])). (20) 

where the expectations are taken over Let V^ii„i :— V{'i>) — V{'i'[x/o]) be the set of hypotheses eliminated 



from the version space by the observation h{x) = o. Rewriting Eq. (20i, we get 



^ PHivm) ^ PH{vm^/o])) - ^'^(^^i™)- (21) 
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Let LH^2T] denote the left hand side of Eq. ( [2T] i. We prove Eq. ( [2T] i as follows. 



LH%T] < Ehev,,^PHih)ypH{Vm [since pH{Vmx/o])) <pH{Vm] 

< PhW ■ PHiVm}/pH{Vi^)) [since h e V{^) ^ pnih) < Ph{V{^))] 

= Ph (Velim) 

We conclude that / is adaptive submodular and strongly adaptive monotone. Additionally, instances with 
general priors are self-certifying for the same reason instances with uniform priors are, namely that if $/, ^ 
and /(dom(^'), — f{X, then V{'i') ~ {h} and $ft is the only realization consistent with ^, which 
means that the instance satisfies the self-certifying condition. Hence we can apply Theorem 10 to this 
self-certifying instance with maximum reward threshold Q — 1, and minimum gap rj ~ 1/ mmiiPu{h), to 
obtain an upper bound of OPT (In (1/ mmfiPff{h)) + 1) on the number of queries made by the generalized 
binary search algorithm. 

To improve this to an C(log |iJ|) -approximation in the event that miuhPHih) is extremely small using the 
observation of |Kosaraju et al.| ( [l999| ), call a policy tt progressive if it eliminates at least one hypotheses from its 
version space in each query. Let p'f^{h) = max {pnih), |^} / J2h' {pH{h'), |^} be the modi- 
fied prior Let c{tt, h) be the cost (i.e., # of queries) of tt under target h. Then c^yg{TT,p) :~ J2h '^i'^' ^)p(^) is 
the expected cost of tt under prior p. We will show that Cavg(7r,p'jLf ) is a good approximation to Cavg(7r,pH)- Call 
h rare if pH{h) < 1/ 1 and common otherwise. First, note that max {p/f(/i'), < l + 

I H\ I H I 

and sop'fj{h) > jjjj^r[PH{h), for all h. Hence for all tt, we have Cavg(7r,p^) > |jj|^]^ Cavg(7r,pg). Next, we 
show Ci,yg{'!T,p'H) < Cavg(7r,_pif) + 1- Consider Cavg (tt, p'j:^ ) - Cavg(7r,pH) = I]/iC(7r,ft.) (p'^(/i) -pnih)). 
The positive contributions must come from rare hypotheses. However, the total probability mass of these 
under p'^ is at most l/|i?|, and since tt is progressive we have c{tt, h) < \H\ for all h, hence the difference in 

costs is at most one. Let a := In ^^^ ^ + 1 < hi(|i/|^ + + Ibe the approximation factor for 

generalized binary search when run on p'j^. Let tt be the policy of generaUzed binary search, and let tt* be an 
optimal policy under prior pn ■ Then 

+ l / ,s^\H\ + l |i?| + l , , 

Cavg(7I-,Pff) < Cavg(7r,pg) < a C^yg{n ,Ph) < a (Cavg(7r ,Ph) + 1) 

With some further algebra, we can derive c^yg{7T,PH) < (cavg(7''*,p^f) + 1) (in (2e|i/|^)). Thus for a general 
prior a simple modification of generalized binary search yields an O(log |i/|) -approximation. ■ 



Extensions to Arbitrary Costs, Multiple Classes, and Approximate Greedy Policies. 



This result easily generalizes to handle the multi-class setting (i.e., |0| > 2), and a-approximate greedy 
policies, where we lose a factor of a in the approximation factor. As we describe in the Appendix, we can 
generalize adaptive submodularity to incorporate costs on items, which allows us to extend this result to handle 
query costs as well. We can therefore recover these extensions of Gui llory and Bilmes|(|2009|l, whi le improving 



the approximation factor for GBS with item costs to In 



miiih PH{h) 



how to extend the technique of 



Kosaraju et al. 



1. 



GuiUory and Bilmes 



(| 1999 1 to obtain an O (log y\H\ ^^^^ j j j - approximation 
combined with our tighter analysis as well to give a similar 



also showed 



with costs using a greedy policy, which may be 
result with an improved leading constant. Recently, |Gupta et"ar] ( |2010) i showed how to simultaneously remove 
the dependence on both costs and probabilities from the approximation ratio. Specifically, within the context 
of studying an adaptive travelling salesman problem they investigated the Optimal Decision Tree problem, 
which is equivalent to the active learning problem we consider here. Using a clever, more complex algorithm 
than adaptive greedy, they achieve an O (log -approximation in the case of non-uniform costs and general 
priors. 
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10. Experiments 

Greedy algorithms are often straightforward to develop and implement, which explains their popular use in 
practical applications, such as Bayesian experimental design and Active Learning, as discussed in ^^'(also 



see the excellent introduction of Nowak ( 2009 1) and Adaptive Stochastic Set Cover, e.g., for filter design 
in streaming databases as discussed in ^ Besides allowing us to prove approximation guarantees for such 
algorithms, adaptive submodularity provides the following immediate practical benefits: 

1 . The ability to use lazy evaluations to speed up its execution. 

2. The ability to generate data-dependent bounds on the optimal value. 

In this section, we empirically evaluate their benefits within a sensor selection application, in a setting similar 



to the one described by |Deshpande et al. (2004). In this application, we have deployed a network V of 



wireless sensors, e.g., to monitor temperature in a building or traffic in a road network. Since sensors are 
battery constrained, we must adaptively select k sensors, and then, given those sensor readings, predict, 
e.g., the temperature at all remaining locations. This prediction is possible since temperature measurements 
will typically be correlated across space. Here, we will consider the case where sensors can fail to report 
measurements due to hardware failures, environmental conditions or interference. 

The Sensor Selection Problem with Unreliable Sensors. More formally, we imagine every location v £V 
is associated with a random variable Xy describing the temperature at that location, and there is a joint proba- 
bility distribution P [AV] that models the correlation between temperature values. Here, Xy = [Xi , . . . , Xn] 
is the random vector over all temperature values. We follow peshpande et al.| ( [2004) l and assume that the joint 
distribution of the sensors is multivariate Gaussian. A sensor v can make a noisy observation — '^v + ^v, 
where e„ is zero mean Gaussian noise with known variance a^. If some measurements yA — Ya are ob- 
tained at a subset of locations, then the conditional distribution P [Xy \ yA = Ya] allows predictions at the 
unobserved locations, e.g., by predicting E[Xv \ yA — Ya] (which minimizes the mean squared error). 
Furthermore, this conditional distribution quantifies the uncertainty in the prediction: Intuitively, we would 
like to select sensors that minimize the predictive uncertainty. One way to quantify the predictive uncertainty 
is to use the remaining Shannon entropy 

H (Xv I yA - ya) ■■= E [- log2 (P [Xv I yA = ya])] ■ 
We would like to adaptively select k sensors, to maximize the expected reduction in Shannon entropy (e.g.. 



Sebastiani and Wynn (2000i). However, in practice, sensors are often unreliable, and might fail to report 
their measurements. We assume that after selecting a sensor, we find out whether it has failed or not before 
deciding which sensor to select next. We suppose that each sensor has an associated probability p{mi{v) of 
failure, in which case no reading is reported, and that sensor failures are independent of each other and of the 
ambient temperature at v. Thus we have an instance of the Stochastic Maximization problem with E := V, 
O :— {working, failed}, and 

fiA, $) := H (Xv) -M{Xv \ Y{v $(.)=worlang}) • (22) 

For multivariate normal distributions, the entropy is given as 

M^{Xv\yA^YA) = l ln(27re)" | J^vA {^AA + (t'^IY^ ^av \ , 

where for sets A and B, Y^ab denotes the covariance (matrix) between random vectors Xa and Xb- Note that 
the predictive covariance does not depend on the actual observations y_4, only on the set A of chosen locations. 
Thus, 

M{Xv\yA^YA) = M{Xv\yA), 



where as usual, HI {Xy \ yA) = E [H {Xy \ yA — Ya)]- As Krause and Guestrin (2005 < show, the function 



g{A) := I (Xv-^yA) = H (Xy) -UiXy \ yA) (23) 

is monotone sub modular, whenever the observations yy are conditionally independent given Xy. 

This insight allows us to apply the result of ^6 to show that the objective / defined in Eq. (22^ is adaptive 
monotone submodular, using f{S) :— g{{v : {v, working) e S}) for any S C E x O. 
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Data and Experimental Setup. Our first data set consists of temperature measurements from the network of 
46 sensors deployed at Intel Research Berkeley, which were sampled at 30 second intervals for 5 consecutive 
days (starting Feb. 28*, 2004). We define our objective function with respect to the empirical covariance 
estimated from the data. 

We also use data from traffic sensors deployed along the highway 1-880 South in California. We use traffic 
speed data for all working days from 6 AM to 11 AM for one month, from 357 sensors. The goal is to predict 
the speed on all 357 road segments. We again estimate the empirical covariance matrix. 

The Benefits of Lazy Evaluation. For both data sets, we run the adaptive greedy algorithm, using both the 
naive implementation (Algorithm and the accelerated version using lazy evaluations (Algorithm 2). We 
vary the probability of sensor failure, and compute the number of evaluations of the function g (defined in 



Eq. (23 i) each algorithm makes. (These function evaluations are the bottleneck in the computation, so the 



number of them serves as a machine-independent proxy for the running time.) Figures 5(a) and 5(c) show the 
performance. On the temperature data set, lazy evaluations speed up the computation by a factor of between 
roughly 3.5 and 7, depending on the failure probability. On the larger traffic data set, we obtain speedup 
factors between 30 and 38. We find that the benefit of the lazy evaluations increases with the problem size 
and with the failure probability. The dependence on problem size must ultimately be explained in terms of 
structural properties of the instances, which also benefit the nonadaptive accelerated greedy algorithm. The 
dependence on failure probability has a simpler explanation. Note that in these applications, if the accelerated 
greedy algorithm selects v, which then fails, then it does not need to make any additional function evaluations 
to select the next sensor. Contrast this with the naive greedy algorithm, which makes a function evaluation for 
each sensor that has not been selected so far. 

The Benefits of the Data Dependent Bound. While adaptive submodularity allows us to prove worst-case 
performance guarantees for the adaptive greedy algorithm, in many practical applications it can be expected 
that these bounds are quite loose. For our sensor selection application, we use the data dependent bounds 
of Lemma f6' to compute an upper bound /3avg on max^ /avg(7r[fe]) as described below, and compare it with 
the performance guarantee of Theorem 5 For the accelerated greedy algorithm, we use the upper bounds 
on the marginal benefits stored in the priority queue instead of recomputing the marginal benefits, and thus 
expect somewhat looser bounds. We find that for our application, the bounds are tighter than the worst case 
bounds. We also find that the "lazy" data dependent bounds are almost as tight as the "eager" bounds using the 



eagerly recomputed marginal benefits A (e | 5') for the latest and greatest \E'. Figures 5(b) and 5(d) show the 
performance of the greedy algorithm as well as the three bounds on the optimal value. 

Two subtleties arise when using the data-dependent bounds to bound max^r /avg(^[fe])- The first is that 

Lemma|6jtells us that A (^*^ \ ^'^ < 'ciisxA(ZE,\A\<k X^esA ^ I whereas we would like to bound the 
difference between the optimal reward and the algorithm's current expected reward, conditioned on seeing 
^P, i.e., E |^/(i5(7rjj.j, $)) — /(dom(\l/), $) | $ ~ \1/ . However, in our applications / is strongly adaptive 
monotone, and strong adaptive monotonicity implies that for any tt* we have 



E 



/(i?(^f,],<i>))~/(dom(*),<i>) I $^vl/ 



< Alvrffejl*). (24) 



Hence, if we let 0PT(5') := max^ E [/(£;(7r[fc],$)) I $ - Lemma|6]impl ies that 

OPT(^') < E[/(dom(*),$) I $ - *1 + max VA(e|*). (25) 



A<ZE,\A\<k 

eGA 

The second subtlety is that we obtain a sequence of bounds from Eq. (25). If we consider the (random) 
sequence of partial realizations observed by the adaptive greedy algorithm, = C 5*1 C • • • C '^k, we ob- 
tain fc + 1 bounds /3o, /3fc, where ft E [/(dom(*,), $) I $ *.,]+max^cB,|A|<fc EeeAA(e|*0- 
Taking the expectation over $, note that for any tt, and any i, 

/avg(7r[fc]) <E[OPT(vI/,)] <E[ft]. 
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(a) Temperature Data: The ratio of function evaluations (b) Temperature Data: Rewards & bounds on the 
made by the naive vs accelerated implementations of optimal value when Pfaii{v) = 0.5 for all v vs. the bud- 
adaptive greedy vs. the budget k on number of sensors get k on number of sensors selected. Averaged over 100 
selected, for various failure rates. Averaged over 100 runs. 



runs. 



Reduction in Fiiiictioii Evaluations vs. Pr[failurc] 




100 200 300 400 

(c) Traffic Data: The ratio of function evaluations made 
by the naive vs accelerated implementations of adaptive 
greedy vs. the budget k on number of sensors selected, 
for various failure rates. Averaged over 10 runs. 



Reward for Adaptive Greedy, 
with Data-Dependent Bounds 




400 



(d) Traffic Data: Rewards & bounds on the optimal value 
when Pfaii{v) = 0.5 for all v vs. the budget k on number 
of sett.sors selected. Averaged over 10 runs. 



Figure 5: Experimental results. 



Therefore for any < i < fc , /?i is a random variable whose expectation is an upper bound on the 
optimal expected reward of any policy. At this point we may be tempted to use the minimum of these, i.e., 
/3min mini {ft} as our ultimate bound. However, a collection of random variables Xq, . . . , Xk with 
E [Xi] > T for all i does not, in general, satisfy min^ {Xi} > r. While it is possible in our case, with its 
independent sensor failures, to use concentration inequalities to bound min^ {ft} — min^ {E [ft]} with high 
probability, and thus add an appropriate term to obtain a true upper bound from /3min, we take a different 
approach; we simply use the average bound ftvg ■— ^qry z2i=o ft- Of course, depending on the application, a 
particular bound ft (chosen independently of the sequence ^"0, ^'i, . . . , ^'fe) may be superior. For example, 
if g is modular, then ft is best, whereas if g exhibits strong diminishing returns, then bounds ft with larger 
values of i may be significantly tighter. 
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11. Adaptivity Gap 



An important question in adaptive optimization is how much better adaptive policies can perform when 
compared to non-adaptive policies. This is quantified by the adaptivity gap, which is the worst-case ratio, over 
problem instances, of the performance of the optimal adaptive policy to the optimal non-adaptive solution. 



Asadpour et al. ( 2008 1 show that in the Stochastic Submodular Maximization problem with independent 
failures (as considered in the expected value of the optimal non-adaptive policy is at most a constant 
factor 1 — 1/e worse than the expected value of the optimal adaptive policy. While we currently do not 
have lower bounds for the adaptivity gap of the general Adaptive Stochastic Maximization problem ([T]), we 
can show that even in the case of adaptive submodular functions, the min-cost cover and min-sum cover 
versions have large adaptivity gaps, and thus there is a large benefit of using adaptive algorithms. In these 
cases, the adaptivity gap is defined as the worst-case ratio of the expected cost of the optimal non-adaptive 
policy divided by the expected cost of the optimal adaptive policy. For the Adaptive Stochastic Minimum Cost 
Coverage problem ( 2 ), Goemans and Vondrak, ( ,2006) show the special case of Stochastic Set Coverage without 
multiplicities has an adaptivity gap of Below we exhibit an adaptive stochastic optimization instance 

with adaptivity gap of r2(|i5|/ log \E\) for the Adaptive Stochastic Min-Sum Cover problem (j3]l, which also 
happens to have the same adaptivity gap for Adaptive Stochastic Minimum Cost Coverage. 

Theorem 19 Even for adaptive submodular functions, the adaptivity gap of Adaptive Stochastic Min-Sum 
Cover is Q.{n/ logn), where n — \E\. 

Proof Suppose E ^ {1, . . . ,n}. Consider the active learning problem where our hypotheses h : E ^ { — 1,1} 
are threshold functions, i.e., h{e) = 1 if e > ^ and h{e) — ~1 if e < £ for some threshold £. There is a 
uniform distribution over thresholds £ e {1, . . . ,n + 1}. In order to identify the correct hypothesis with 
threshold £, our policy must observe at least one of £ — 1 or ^ (typically both for 1 < £ < n). Let ttjv be 
any non-adaptive policy, which can be represented as a permutation of E. It can be seen that the optimal 
non-adaptive policy must be a permutation; observing the same element multiple times can only increase the 
cost, and each element must eventually be selected to guarantee coverage. For the min-sum cover objective, 
consider playing ttn for n/2 time steps. It can be seen that the probability that the correct hypothesis has been 
identified is less than 1/2. Thus a lower bound on the expected cost of tt^v is n/4, since for ri/2 time steps, at 
each time step a cost of at least 1/2 is incurred. Thus, for both the min-cost and min-sum cover objectives the 
cost of the optimal non-adaptive policy is fl{n). 

As an example adaptive policy, we can implement a natural binary search strategy, which is guaranteed to 
identify the correct hypothesis after 0(log n) steps, thus incurring cost 0(log n), proving an adaptivity gap of 
il{n/\ogn). ■ 



12. Hardness of Approximation 

In this paper, we have developed the notion of adaptive submodularity, which characterizes when certain 
adaptive stochastic optimization problems are well-behaved in the sense that a simple greedy policy obtains a 
constant factor or logarithmic factor approximation to the best policy. 

In contrast, we can also show that without adaptive submodularity, the adaptive stochastic optimization 
problems (l}, Q, and ([3) are extremely inapproximable, even with (pointwise) modular objective functions 
(i.e., those where for each $, / : 2^ x — > M is modular/linear in the first argument): We cannot hope 
to achieve an 0{\E\^^'^) approximation ratio for these problems, unless the polynomial hierarchy collapses 
down to EI". 

Theorem 20 In general, for all (possibly non-constant) /3 > 1, no polynomial time algorithm for Adaptive 
Stochastic Maximization with a budget of /3 k items can approximate the reward of an optimal policy with a 
budget of only k items to within a multiplicative factor of 0{\E\^~^ / (3) for any e > 0, unless PH = S^. This 
holds even for pointwise linear f. 
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We provide the proof of Theorem 20 in Appendix A.6 Note that by setting /3 = 1, we obtain ©(jiJl ^ 



hardness for Adaptive Stochastic Maximization. It turns out that in the instance distribution we construct in 
the proof of Theorem 20 the optimal policy covers every realization (i.e., always finds the treasure) using a 
budget of fc = 0{\E\^/^ items. Hence if PH ^ then any randomized polynomial time algorithm wishing 
to cover this instance must have a budget f3 = il(|£'|^^'^) times larger than the optimal policy, in order to 
ensure the ratio of rewards, which is il(|£'|^^^//3), equals one. This yields the following corollary. 

Corollary 21 In general, no polynomial time algorithm for Adaptive Stochastic Min Cost Coverage can 
approximate the cost of an optimal policy to within a multiplicative factor of 0(\E\^~'^) for any e > 0, unless 
PH = T.^. This holds even for pointwise linear f. 

Furthermore, since in the instance distribution we construct the optimal policy vr* covers every realization 
using a budget of k, it has cs(7r*) < k. Moreover, since we have shown that under our complexity theoretic 
assumptions, any polynomial time randomized policy tt with budget /3k achieves at most o{f3/\E\^^'^) of the 
(unit) value obtained by the optimal policy with budget k, it follows that cs(7r) = n{/3k). Since we require 
/3 = ^1{\E\^~^) to cover any set of reahzations constituting, e.g., half of the probabihty mass, we obtain the 
following corollary. 

Corollary 22 In general, no polynomial time algorithm for Adaptive Stochastic Min-Sum Cover can approx- 
imate the cost of an optimal policy to within a multiplicative factor of 0{\E\^^'^) for any £ > 0, unless 
PH = This holds even for pointwise linear f. 



13. Related Work 

There is a large literature on adaptive optimization under partial observability which relates to adaptive 
submodularity, which can be broadly organized into several different categories. Here, we only review relevant 
related work that is not already discussed elsewhere in the manuscript. 

Adaptive Versions of Classic Non-adaptive Optimization Problems. Many approaches consider stochas- 
tic generalizations of specific classic non-adaptive optimization problems, such as Set Cover (Goemans and| 



Vondrak 2006 Liu et al. 2008) , Knapsack ( [Dean et al. 2008 2005) and Traveling Salesman ( jGupta et al.| 
2010| i. In contrast, in this paper our goal is to introduce a general problem structure - adaptive submodularity 



that unifies a number of adaptive optimization problems. This is similar to how the classic notion of sub- 
modularity unifies various optimization problems such as Set Cover, Facility Location, nonadaptive Bayesian 
Experimental Design, etc. 



Interactive Submodular Set Cover. Recent work by Guillory and Bilmes| ( |2010[ l considers a class of 



adaptive optimization problems over a family of submodular objectives {fh : h £ H}. In their problem, one 
must cover a monotone submodular objective //j. which depends on the (initially unknown) target hypothesis 
h* £ H, by adaptively issuing queries and getting responses. Unlike traditional pool-based active learning, 
each query may generate a response from a set of valid responses depending on the target hypothesis. The 
reward is calculated by evaluating fh' on the set of (query, response) pairs observed, and the goal is to obtain 
some threshold Q of objective value at minimum total query cost, where queries may have nonuniform costs. 
Guillory and Bilmes consider the worst-case policy cost, and provide a greedy algorithm optimizing a clever 
hybrid objective function and prove it has an approximation guarantee of ln((3|_ff |) + 1 for integer valued 
objective functions {fh}heH- 

While similar in spirit to this work, there are several significant differences between the two. Guillory and 
Bilmes focus on worst-case policy cost, while we focus mainly on average-case policy cost. The structure of 
adaptive submodularity depends on P [<I>], whereas there is no such dependence in Interactive Submodular Set 
Cover. This dependence in tum allows us to obtain results, such as Theorem 10 for self-certifying instances, 
whose approximation guarantee does not depend on the number of realizations in the way that the guarantees 
for Interactive Submodular Set Cover depend on |. As Guillory and Bilmes prove, the latter dependence is 
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fundamental under reasonable complexity-theoretic assumption^ An interesting open problem within the 
adaptive submodularity framework that is highlighted by the work on Interactive Submodular Set Cover is to 
identify useful instance-specific properties that are sufficient to improve upon the worst-case approximation 
guarantee of Theorem [TT| 

Greedy Frameworks for Adaptive Optimization. The paper that is perhaps closest in spirit to this work 
is the one on Stochastic Depletion problems by |Chan and Farias| ( |2009| l, who also identify a general class of 
adaptive optimization problems than can be near-optimally solved using greedy algorithms (which in their 
setting gives a factor 2 approximation). However, the similarity is mainly on a conceptual level: The problems 
and approaches, as well as example applications considered, are quite different. 

Stochastic Optimization with Recourse. A class of adaptive optimization problems studied extensively 
in operations research (since .Dantzig ( _ 1 955 is the area of stochastic optimization with recourse. Here, an 
optimization problem, such as Set Cover, Steiner Tree or Facility Location, is presented in multiple stages. At 
each stage, more information is revealed, but costs of actions increase. A key difference to the problems studied 
in this paper is that in these problems, information gets revealed independently of the actions taken by the 
algorithm. There are general efficient, sampling based (approximate) reductions of multi-stage optimization to 
the deterministic setting, see, e.g., |Gupta et al. ( 2005| l. 



Bayesian Global Optimization. Adaptive Stochastic Optimization is also related to the problem of Bayesian 
Global Optimization (c./, ,Brochu et al. (2009) for a recent survey of the area). In Bayesian Global Optimization, 
the goal is to adaptively select inputs in order to maximize an unknown function that is expensive to evaluate 
(and can possibly only be evaluated using noisy observations). A common approach that has been successful in 



many applications (c.f., Lizotte et al. (2007) for a recent application in machine leaming), is to assume a prior 



distribution, such as a Gaussian process, over the unknown objective function. Several criteria for selecting 
inputs have been developed, such as the Expected Improvement Pones et a F 1998) criterion. However, while 



recently performance guarantees where obtained in the no-regret setting ( jGriinewalder et al. 2010 Srinivas 



et al. 2010[ l, we are not aware of any approximation guarantees for Bayesian Global Optimization. 



Probabilistic Planning. The problem of decision making under partial observability has also been exten- 
sively studied in stochastic optimal control. In particular. Partially Observable Markov Decision Processes 
(POMDPs, |Smallwood and Sondik| ( [l973| l) are a general framework that capture many adaptive optimization 



problems under partial observability. Unfortunately, solving POMDPs is PSPACE hard ( Papadimitriou and 



Tsitsiklis , 1987), thus typically heuristic approximations with no performance guarantees are applied (Pineau 



et al.j,2006) . For some special instances of POMDPs related to Multi-armed Bandit problems, (near-)optimal 



policies can be found. These include the (optimal) Gittins-index policy for the classic Multi-armed Bandit 
problem ( [Gittins and Jones 1979) and approximate policies for the Multi-armed Bandit problem with metric 



switching costs (Guha and Munagala, ,2009) and special cases of the Restless Bandit problem (Guha et al. 



2009[ l. The problems considered in this paper can be formalized as POMDPs, albeit with exponentially large 



state space (where the state represents the selected items and observations). Thus our results can be interpreted 
as widening the class of partially observable optimization problems that can be efficiently approximately 
solved. 

Previous Work by the Authors. This manuscript is an extended version of a paper that appeared in the 



Conference on Leaming Theory (COLT) 2010 (Golovin and Krause 2010 1. The present version significantly 



expands on the previous one, and includes an improved policy-centric treatment of adaptive submodularity, 
new theoretical results on adaptive coverage and min-sum coverage, the accelerated adaptive greedy algorithm, 
new applications, a new kitchen sink, new hardness and adaptivity gap results, the incorporation of item costs, 
and experiments. 



3. They reduce to Set Cover and use the resuU of 



Feige 1998), which requires the assumption NP g DTIME(ng'('°g ") ), but it 



suffices to assume only P ^ NP using the Set Cover approximation hardness result of |Raz and Safra||l997| instead. 
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14. Conclusions 



In this paper, we introduced the concept of adaptive submodularity, generahzing submodular set functions to 
adaptive poUcies. Our generahzation is based on a natural adaptive analog of the diminishing returns property 
well understood for set functions. In the special case of deterministic distributions, adaptive submodularity 
reduces to the classical notion of submodular set functions. We proved that several guarantees carried by the 
non-adaptive greedy algorithm for submodular set functions generalize to a natmal adaptive greedy algorithm 
in the case of adaptive submodular functions, for constrained maximization and certain natural coverage 
problems with both minimum cost and minimum sum objectives. We also showed how the adaptive greedy 
algorithm can be accelerated using lazy evaluations, and how one can compute data-dependent bounds on 
the optimal solution. We illustrated the usefulness of the concept by giving several examples of adaptive 
submodular objectives arising in diverse applications including sensor placement, viral marketing and pool- 
based active learning. Proving adaptive submodularity for these problems allowed us to recover existing 
results in these applications as special cases and lead to natural generalizations. Our experiments on real data 
indicate that adaptive submodularity can provide practical benefits, such as significant speed ups and tighter 
data-dependent bounds. We believe that our results provide an interesting step in the direction of exploiting 
structure to solve complex stochastic optimization problems under partial observabiUty. 
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Appendix A. Additional Proofs and Incorporating Item Costs 

In this appendix we provide all of the proofs omitted from the main text. For the results of ^ we do so by 
first explaining how our results generalize to the case where items have costs, and then proving generalizations 
which incorporate item costs. 

A.l Incorporating Costs: Preliminaries 

In this section we provide the preliminaries required to define and analyze the versions of our problems with 
non-uniform item costs. We suppose each item e ^ E has a cost c(e), and the cost of a set 5 C iJ is given 
by the modular function c{S) — X^ees ^i^)- define the generalizations of problems (|T|), ([2jl, and (jsjl in 



\ A. 3 \ A.4 and \ A.5 respectively. 

Our results are with respect to the greedy policy vr^'^'^''''^ and a-approximate greedy policies. With costs, 
the greedy policy selects an item maximizing A(e | '5) /c(e), where ^ is the current partial realization. 

Definition 23 (Approximate Greedy Policy with Costs) A policy tt is an a -approximate greedy policy if 

for all such that there exists e d E with A (e | > 0, 

r A(e|*) 1 /A(e'|*) 
[_ c(ej a e' \^ c(e') 
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and IT terminates upon observing any 4' such that A (e | < Q for all e (z E. That is, an a-approximate 
greedy policy always obtains at least (1/a) of the maximum possible ratio of conditional expected marginal 
benefit to cost, and terminates when no more benefit can be obtained in expectation. A greedy policy is any 
1-approximate greedy policy. 

It will be convenient to imagine the policy executing over time, such that when a policy tt selects an item e, 
it starts to run e, and finishes running e after c(e) units of time. We next generalize the definition of policy 
truncation. Actually we require three such generalizations, which are all equivalent in the unit cost case. 

Definition 24 (Strict Policy Truncation) The strict level t truncation of a policy n, denoted by TT[^t], 
obtained by running t: for t time units, and unselecting items whose runs have not finished by time t. Formally, 

7r[.j_t] has domain ^ £ dom(7r) : c{ti{'^)) + X]eGdom(*) '^(^) — '^^^ agrees with tt everywhere in its 
domain. 

Definition 25 (Lax Policy Truncation) The lax level t truncation of a policy tt, denoted by T^it^], is obtained 
by running tt for t time units, and selecting the items running at time t. Fonnally, TT[t~^] has domain 

I^E" G dom(7r) : X]eedom(*) '^(^) ^ <^nd agrees with n everywhere in its domain. 

Definition 26 (Policy Truncation with Costs) The level-t-truncation of a policy tt, denoted by TT^t], is a 

randomized policy obtained by running tt for t time units, and if some item e has been running for < 
T < c(e) time at time t, selecting e independently with probability t jcie). Formally, tt^jj is a randomized 
policy that agrees with tt everywhere in its domain, has dom(7r[^f]) C dom(7r[(]) C dom(7r[i_5.]) with 
certainty, and includes each 4' e dom(7r[f_j.]) \ dom(7r[^(]) in its domain independently with probability 

{t - Eeedom(*) C(e)) /ciTTW). 

In the proofs that follow, we will need a notion of the conditional expected cost of a policy, as well as an 
alternate characterization of adaptive monotonicity, based on a notion of policy concatenation. We prove the 
equivalence of our two adaptive monotonicity conditions in Lemma 30 

Definition 27 (Conditional Policy Cost) The conditional policy cost o/tt conditioned on denoted c{tt\'^), 
is the expected cost of the items tt selects under P [$ | ^f]. That is, c{tt\^) := E [c{E{tt, $)) | <& v]/]. 

Definition 28 (Policy Concatenation) Given two policies tti and tt2 define tti@tt2 as the policy obtained 
by running tt\ to completion, and then running policy tti as if from afresh start, ignoring the information 
gathered'^ during the running of tt\. 

Definition 29 (Adaptive Monotonicity (Alternate Version)) A function f : 2^ x M>o is adaptive 

monotone with respect to distribution P [$] if for all policies tt and tt', it holds that favgiT^) < /ai'g(7'''@7r), 
where favgiir) E [J{E{tt, $), $)] is defined w.r.t. P [$]. 

Lemma 30 (Adaptive Monotonicity Equivalence) Fix a function / : 2^ x — > M>o. Then A(e 1 5*) > 
for all ^' with P [vP] > and all e £ E if and only if for all policies tt and tt' , favgi"^) l£ favgi''^'®''^)- 

Proof Fix policies tt and tt' . We begin by proving /avg(7'''@7'') = /avg(7r@7r'). Fix any $ and note 

that £;(7r'@7r, $) = E{tt',^) U E{tt,^) = E{tt@tt',^). Hence /avg(7r'@7r) = E [/(£;(7r'@7r, $), $)] = 
E [/(i?(7r@7r', $), $)] = /avg(7i'@7r'). So the /avgCTi") < /avg(7r'@7r) condition can be replaced with the 
condition /avg(7r) < /avg(7r@7r'). 

4. Technically, if under any realization <& policy 7r2 selects an item that tti previously selected, then vri @7r2 cannot be written as a 
function from a set of partial realizations to E, i.e., it is not a policy. This can be amended by allowing partial realizations to be 
multisets over elements of B X O, so that, e.g., if e is played twice then (e, "l?(e)) appears twice in ^. However, in the interest of 
readability we will avoid this more cumbersome multiset formalism, and abuse notation slightly by calling tti @7r2 a policy. This 
issue arises whenever we run some policy and then ran another from a fresh start. 
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We first prove the forward direction. Suppose A (e | ^I*) > for all ^ and all e G E. Note the expression 
/avg(7r@7r') — /avg(7r) can be written as a conical combination of (nonnegative) A(e|4') terms, i.e., for 

some a > 0, /avg(7r@7r') - /avg(7r) = e ^(^ I Hence /avg(7r@7r') - /avg(7r) > and so 

/avg(7r) < /avg(7r@^') = /avg(7r'@7r). 

We next prove the backward direction, in contrapositive form. Suppose A (e | < for some \E' with 
P ['S'] > and e G E. Let ei, . . . , be the items in dom(^') and define policies tt and tt' as follows. 
For i = 1,2, ... ,r, both tt and tt' select and observe ^•(6^). If either policy observes $(ej) 7^ '^(ci) it 
immediately terminates, otherwise it continues. If tt succeeds in selecting all of dom(\l/) then it terminates. If tt' 
succeeds in selecting all of dom(^) then it selects e and then terminates. We claim /avg(7i'@7r') — /avgCf) < 0. 
Note that £'(7r@7r', ^) = E{'k, $) unless $ ~ ^f, and if $ ~ ^ then E{-k@-k' , $) = £^(77, $) U {e} and also 
E{'K, = dom(*). Hence 

/avg(7r@7r') - /avg(7r) = E [f{E{^@^', $).$)- f{E{^, $),$)] 

= E [/(£;(7r@7r', $), - f{E{Tr, $), $) | $ - • P [$ - m] 
= E [/(dom(*) U {e} , $) - /(dom(*), 4>) | $ ~ 41] • P [$ ~ ^i] 
= A(e|*) •P[$ ~ *] 

The last term is negative, as P[$ ~ ^t] > and A(e|4') < by assumption. Therefore /avg(7r) > 
/avg(7r@7r') = /avg(7r'@7r), which completes the proof. ■ 



A.2 Adaptive Data Dependent Bounds with Costs 

The adaptive data dependent bound has the following generalization with costs. 

Lemma 31 (The Adaptive Data Dependent Bound with Costs) Suppose we have made observations ^ 
after selecting dom(\E'). Let w* be any policy. Then for adaptive monotone submodular / : 2^ x — >■ M>o 



w/iere = max^ {X^eeE^e A(e|*) : c(e)'u;e < c(7r* | *) am/ Ve e < We < l}- 

Proof Order the items in dom(5') arbitrarily, and consider the policy tt that for each e G dom(^') in order 
selects e, terminating if $(e) 7^ \l/(e) and proceeding otherwise, and, should it succeed in selecting all of 
dom(^') without terminating (which occurs iff $ ~ then proceeds to run tt* as if from a fresh start, 
forgetting the observations in By construction the expected marginal benefit of running the tt* portion 
of TT conditioned on $ ~ * equals A(7r* | *). For all e & E, let w{e) = P [e G E{'k, $) | $ ~ vE-] be the 
probability that e is selected when running tt, conditioned on i> ~ ^f. Whenever some e G E \ dom(^) is 
selected by tt, the current partial realization contains ^ as a subrealization; hence adaptive submodularity 
implies A (e | "if') < A (e | 4*). It follows that the total contribution of e to A (tt* | ^P) is upper bounded by 
ti;(e)-A(e|*). Summing over e G £\dom('I'), wegetaboundof A(7r* | 'J') < J2e€E\dom{^) 
Next, note that each e G E \ dom(^) contributes w(e)c(e) cost to c (tt* | ^f). Hence it must be the case 
that J2eeE\dom{<!/) 'w{e)c{e) < c{tt* \ '^). Obviously, w{e) G [0, 1] for all e, since iu{e) is a probability. 
Hence A (tt* I*) < Y.e e £;\dom('i') 'w{e)A(e I'i') < Z because setting — w{e) is feasible for the the linear 
program for which Z is the optimal value. 

To show Z < c (tt* 1 4") maxg (A (e | ^I/) / c(e)), consider any feasible solution w to the linear program 
defining Z. It attains objective value 




(26) 



A(e|*) 
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since J2eeE '"^ec(e) < c (tt* | by the feasibility of w. 



A simple greedy algorithm can be used to compute Z; we provide pseudocode for it in Algorithm^ The 
correctness of this algorithm is more readily discerned upon rewriting the linear program using variables = 
c(e)we toobtain Z = max^ {X^eg^a^e (A(e|*) /c(e)) : Y^^Xe < c{Tr* l'^) and Ve e -E, < < c(e)}. 
Intuitively, it is clear that to optimize x we should shift mass towards variables with the highest A (e | /c(e) 
ratio. Clearly, any optimal solution has = c {tt* \ ^f). Moreover, in any optimal solution, A (e | 5*) /c(e) > 

A(e' I ^E*) /c(e') implies x^ = c(e) or Xe' = 0, since otherwise it would be possible to shift mass from x^' 
to Xe and obtain an increase in objective value. If the A(e| /c(e) values are distinct for distinct items, 
there will be a unique solution satisfying these constraints, which Algorithm 3 will compute. Otherwise, we 
imagine perturbing each A (e | VP) by a independent random quantities drawn uniformly from [0, e] to make 
them distinct. This changes the optimum value by at most \E\e, which vanishes as we let e tend towards zero. 
Hence any solution satisfying J^e-'^e = c (tt* | vjf) and A (e | ^E") /c(e) > A (e' | ^P) /c(e') implies Xe — c(e) or 
Xe' = is optimal. Since Algorithm [3] outputs the value of such a solution, it is correct. 



Input: Groundset E; Partial realization ^I^; Costs c : — > N; Budget C = c (tt* | ^f); Conditional 

expected marginal benefits A{e\'^) for all e E E. 
Output: Z = max„ We A(e|*) : c(e)we < c (tt* | and Ve G < < 1} 

begin 

Sort by A (e | ^) /c(e), so that ^ > ^ > • ■ • > 
Set w -s- 0; i -s- 0; a -s- 0; z -s- 0; e NULL; 
while a < C do 

i -s— i + 1; e -s— e^; 
We *i— min {1, C — a}; 
a ^ a + c{e)we', z -s— z + WgA (e | ^'); 
Output z; 

end 

Algorithm 3: Algorithm to compute the data dependent bound Z of Lemma |31| 



A.3 The Max-Cover Objective 

With item costs, the Adaptive Stochastic Maximization problem becomes one of finding some 

TT* e argmax/avg(7r[fe]) (27) 

7r 

where fc is a budget on the cost of selected items, and we define /avg(7r) for a randomized policy tt to be 
/avg(7r) E [/(i?(7r, $), $)] as before, where the expectation is now over both <E> and the internal randomness 
of TT which determines E{tt, $) for each $. We prove the following generalization of Theorem 5 , 

Theorem 32 Fix any a > 1 and item costs c : E ^ N. If f is adaptive monotone and adaptive submodular 
with respect to the distribution P [<&], and tt is an a-approximate greedy policy, then for all policies tt* and 
positive integers £ and k 

/o..s('r[^])> (l-e-'/"'=)/„.,(4]). 

Proof The proof goes along the lines of the performance analysis of the greedy algorithm for maximizing 
a submodular function subject to a cardinality constraint of |Nemhauser et aL] ( |I978| l. An extension of that 
analysis to a-approximate greedy algorithms, which is analogous to ours but for the nonadaptive case, is 



shown by Goundan and Schulz (2007 1. For brevity, we will assume without loss of generality that tt — tt[£] 



and TT* = 7rj*j.] . ITien for all i, < i < £ 



/avg(7r*) < /avg(7r[j]@7r*) < /avg(7r[,])+afc(/avg(7r[i+i])-/avg(7r[i])) (28) 
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The first inequality is due to the adaptive monotonicity of / and Lemma 30 from which we may infer 
/avg(7r2) < /avg(7i'i@'''2) for any tti and 112- The second inequality may be obtained as a corollary of Lemma 31 



as follows. Define a random partial realization 5" = ^("I>) :— |(e, <i>(e)) : e G £'(7r[j], $)}, where $ is 
distributed as P [$]. Consider A (tt* | 5"), which equals the expected marginal benefit of the tt* portion of 
'^TT* conditioned on $ ^ ^. Lemma 



TT 
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allows us to bound it as 



E[A(7r*|*)] < E[c(7r* I*)] • max(A(e|*)/c(e)) , 

e 

where the expectations are taken over the internal randomness of tt*, if there is any. Note that since 
TT* has the form 7rjj,j for some tt' we know that for all $, E [c(£'(7r*, <i>))] < k, where the expectation 
is again taken over the internal randomness of tt*. Hence E[c(7r* |^)] < k for all ^f. It follows that 
E [A(7r* I \E')] < k ■ maxe (A(e | 5*) /c(e)). By definition of an a-approximate greedy policy, tt obtains at 
least (1/a) maxg (A (e | ^P) /c(e)) > E [A (tt* | ^P)] /ak expected marginal benefit per unit cost in this case. 
Next we remove the conditioning on by taking expectations. For a random variable X = X(\E'), we let 
E>i,^ii [X] denote the expectation of X with respect to measure P [^] . Then 



/avg(7r[i+i])-/avg(7r[i]) > E*„ 



1 ^ A(el^) 

— max — 

a e y c(e) 



> Eq 



E[A(7r*|^')] 



ak 



/a, 



^TT*) - /avg(7r[i]) 



ak 



which may be rearranged to yield the second inequality in (28 1. 

Now define A.^ :— /avg(7r*) — /avg(7r[i]), so that (28] ) implies A^ < afc(Ai — A^+i), from which we infer 

Aq, where for this last inequality we have 



A.+i < (1 - 2^) A, and hence A, < (l - ^) Aq < e"^/"^ 
used the fact that 1 - a; < e"^ for all x > Q. Thus /avg(7r*) - /avg(7r[^]) < e"^/"'' (/avg(7r*) - /avg(7r[o])) < 
e-'/"'=/avg(^*) so > (1 - e-^/"'=)/,vg(7r*). 



A.4 The Min-Cost-Cover Objective 

In this section, we provide arbitrary item cost generalizations of Theorem TO' and TheoremTT" With item costs 
the Adaptive Stochastic Minimum Cost Cover problem becomes one of finding, for some quota on utility Q, 

TT* e argmincavg(7r) such that /(i?(7r, $),<&) > Q for all $, (29) 

where Cavg(7r) := E [c{E{Tr, <&))]. Without loss of generality, we may take a truncated version of /, namely 
{A, $) I— > min {Q, f{A, $)}, and rephrase Problem {^29j as finding 

TT* e arg min Cavg(7r) such that tt covers $ for all $. (30) 

TT 

Hereby, recall that tt covers $ if E [f{E{TT, <!>), $)] = f{E, $), where the expectation is over any internal 
randomness of tt. We will consider only Problem (3 Op for the remainder. We also consider the worst- 
case variant of this problem, where we replace the expected cost Cavg(7r) objective with the worst-case cost 
CwcItt) := max$ c{E{tt, <i>)). 

The definition of coverage (Definition |7] in [5.2 on page 10 1 requires no modification to handle item 



costs. Note, however, that coverage is all-or-nothing in the sense that covering a realization with prob- 
ability less than one does not count as covering it. A corollary of this is that only items whose runs have 
finished help with coverage, whereas currently running items do not. For a simple example, consider the 
case where E = {e}, c(e) ~ 2, f{A, $) = and policy tt that selects e and then terminates. Then 
7r[i] is a randomized policy which is tt with probability i, and is the empty policy with probability i, so 
E [f{E{TT, $), $)] = i < 1 f{E, $) for each $. Hence, even though half the time ttjij covers all realiza- 
tions, it is counted as not covering any. 

We begin with the approximation guarantee for the average-case policy cost with arbitrary item costs. 
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Theorem 33 Suppose / : 2^ x — > ]R>o is adaptive submodular and strongly adaptive monotone 
with respect to P [$] and there exists Q such that f{E, $) = Q for all <I>. Let rj be any value such that 
f{S, ^) > Q ~ rj implies f{S, $) — Qfor all S and $. Let S = niin$ P [$] be the minimum probability of 
any realization. Let tt*,,^ be an optimal policy minimizing the expected number of items selected to guarantee 
every realization is covered. Let n be an a-approximate greedy policy with respect to the item costs. Then in 
general 



,(7r) < aCavgiKvg) ( ( ) + 1 



and for self-certifying instances 

Cavg{-^) < aCa,g{TT*„g) ( In 



Note that //'range(/) C Z, then rj ~ 1 is a valid choice, so in this case Cayg(7r) < a CavgiT^avg) + 1) 

and CavgiT^) < Cfl,,g(7r*y^) (ln(Q) + 1) for general and self-certifying instances, respectively. 

Proof Consider running a-approximate greedy policy tt to completion, i.e., until it covers the true realization. 
It starts off with vq :— E [/(0, $)] > reward in expectation, and terminates with Q reward. Along the way it 
will go through some sequence of partial realizations specifying its current observations, ^Pq C 5*1 C • • • C 
such that dom(5'i) \ dom(5'i_i) consists precisely of the i*'' item selected by tt. We call this sequence the 
trace t = r($) of tt. For a realization $ and x E M>o, we define ^ (<i>, x) as the partial realization seen by tt 
just before it achieved x reward in expectation. Formally, 

e argmax{| dom(\E')| : vj/ r($), E [/(dom(vI/), $) | $ vp] < a::} (31) 

Note that ^ ($, x) exists for all x G {vq, Q], and when it exists it is unique since no two elements of the 
trace have equally large domains. Also note that by the strong adaptive monotonicity of /, the function 
I !-?> E [/(dom(\l/i), $) I $ ~ ^fj] must be nondecreasing for any trace V&o, ^i, • ■ • , 

Our overall strategy will be to bound the expected cost Cavg (tt) of tt by bounding the price it pays per 
unit of expected reward gained as it runs, and then integrating over the run. Note that Lemma^Slj tells us that 
maxg (A (e | 5') /c(e)) > A (7r*^,g | ^E*) /c (vr*yg | for all ^I^. An a-approximate greedy policy obtains at least 
1/a of this rate. Hence we may bound its price, 9, by d{^) < ac (7r*yg | /A (7r*yg | ^) . 

Rather than try to bound the expected price as tt progresses in time, we will bound the expected price 
as it progresses in the expected reward it obtains, measured as E [/(dom(5'), $) | $ ^ ^P] where ^> is the 
current partial realization. We next claim that A (7r*yg | ^I* ($, x)) > Q — x for all $ and x. Note that 
E [/(dom(\E' ($, a;)), $) I $ * ($, x)] < x by definition of ($, x), and f{E{-Kl^^, $), <&) = Q for all $ 
since 7r*^g covers every realization. Since Q is the maximum possible reward, if A (7r*^g | 5" (<!', x)) < Q — x 
then we can generate a violation of strong adaptive monotonicity by fixing some $ ^ x), selecting 

i?(7r*yg, $), and then selecting dom(^' ($, x)) to reduce the expected reward. Thus A (7r*yg | ($, x)) > 
Q — X, and we infer 

a c U* I * ($, x)) a c U* I * ($, x)) 
^ ' ' " - A(7r:,g|vl/($,x)) - Q^x 

Next, we take an expectation over $. Let 6{x) := E [9{'i> ($, x))]. Let 'i'f, . . . , be the possible values of 
($, x). Then because {{<i> : $ ^ } : i = 1,2, . . . ,r} partitions the set of realizations, 

r 

E[c(7r:,g|vl/($,a:))] = ^ P [vp^] ^ P [$ | vf^] . c «g | $) (33) 

1=1 * 

= ^P[<i>].c(7r:,g|$) (34) 

= Cavg«vg) (35) 
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It follows that 



9{x) < ^^^t^, (36) 
Q - X 

Let Cavg(vr, Q') denote the expected cost to obtain expected reward Q' . Then we can bound Cavg(7r, Q') as 

Cavg(7r,Q') - r e{x)dx < r ^^^^^dx = acavg(^*)lnf^^^) (37) 

J x=0 Jx 



Q-x "'^^ ' \Q-Q' 



We now use slightly different analyses for general instances and for self-certifying instances. We begin 
with general instances. For these, we set Q' = Q — Srj and use a more refined argument to bound the 
cost of getting the remaining expected reward. Fix 5" G dom(7r) and any $' ^ vj/. We say covers $' 
if TT covers $' by the time it observes By definition of 6 and rj, if some $' is not covered by then 
Q — E [/(dom(5'), $) | <i> ~ ^f] > Sij. Hence the last item that tt selects, say upon observing must 
increase its conditional expected value from E [/(dom(5'), $) | <i> ~ Vl/] < Q — 6r] to Q. By Eq. (32i, it 
follows that for x £ [Q ~ 5ri,Q], 

As before, we may take the expectation over <i> to obtain 9{x) < a Cayg{TT*^„) / Srj for all x E [Q — 5t], Q]. This 
fact together with Eq. ( 37) yield 

Cavg(7r) = Cavg(7r,(5) = C^ygilT , Q - Slj) + J^^Q_g^^E [9{x)] dx 

< ac,.,i7:*)ln{Q/Srj)+J^^Q_,^^^dx 

= aCavg(^*)(ln(Q/<577) + l) 

which completes the proof for general instances. 

For self-certifying instances we use a similar argument. For these instances we set Q' — Q — rj, and argue 
that the last item that tt selects must increase its conditional expected value from at most Q — ijlo Q. For sup- 
pose TT currently observes and has not achieved conditional value Q, i.e., E [/(dom(^'), <&) | $ ^ ^] < Q. 
Then some $ ^ is uncovered. Since the instance is self-certifying, every $ with $ ^ \1/ then has 
/(dom(\['), $) < f{E, $) = Q. By definition of ry, for each $ with $ * we then have /(dom(^'), $) < 
Q — r], which implies E [/(dom(^'), <i>) | $ '-^ < Q — rj. Reasoning analogously as with general 
instances, we may derive from this that 9{x) < a c^Yg{TT*^g) / rj for all x £ [Q ~ J],Q]- Computing 

Cavg(7r) — Cavg(7r, Q') + J^^g, E[9{x)] dx as before gives us the claimed approximation ratio for self- 
certifying instances, and completes the proof. ■ 



Next we consider the worst-case cost. We generalize Theorem[TT]by incorporating arbitrary item costs. 

Theorem 34 Suppose / : 2^ x — > M>o is adaptive monotone and adaptive siibmodular with respect to 
P [$], and let 77 be any value such that f(S, $) > f{E, ^) ~ ij implies f{S, $) = f{E, ^)for all S and $. 
Let 6 = min$ P [<i>] be the minimum probability of any realization. Let tt^^ be the optimal policy minimizing 
the worst-case cost Cwc{') while guaranteeing that every realization is covered. Let tt be an a-approximate 
greedy policy with respect to the item costs. Finally, let Q :=E [f{E, $)] be the maximum possible expected 
reward. Then 

c„c{tt) < ac„c{Tr'^c) (in ( T~] + ^ 



Proof Let tt be an a-approximate greedy policy. Let k — Cwc(7r^c)' ^ — oikhi{Q / Srj), and apply 
Theorem[32]with these parameters to yield 

/avg(^[£l) > (1 - e-^/"^) /avg«J = (l " ^) /avg«e)- (38) 
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Since Trj^^ covers every realization by assumption, fnvg{T^wc) — ^[fi^-^)] = so rearranging terms 
of Eq. (3S) yields Q — /avg(7r[£]) < Srj. Since /avg(7r[^]) < /avg(7''[^^]) by the adaptive monotonicity of /, 
it follows that Q — /avg(7r[£_j.]) < Srj. By definition of 6 and rj, if some $ is not covered by ttj^^j then 
Q — /avg(7r[£_j.]) > Srj. Thus Q — /avg(7r[^_>]) < 6ri implies Q — /avg (7r[^_j.] ) = 0, meaning 7r[^_j.] covers every 
realization. 

We next claim that 7r[£^] has worst-case cost at most £ + ak. It is sufficient to show that the final item 
executed by Tr[e^] has cost at most ak for any realization. As we wiU prove, this follows from the facts that tt 
is an a-approximate greedy policy and tt^^ covers every realization at cost at most k. The data dependent 
bound. Lemma |31j on page|36j guarantees that 

max — > — r— — > ; (39) 

e \ c(e) ; - c(7r*„jvl;) " fc 

Suppose e dom(7r). We would like to say that maxg A (e | ^P) < A (tt^^ | 5*). Supposing this is true, any 
item e with cost c(e) > ak must have A (e | 5*) /c(e) < A (tt^^ | 5") /ak, and hence cannot be selected by any 
a-approximate greedy policy upon observing 5' by Eq. ( 39 1, and thus the final item executed by 7r[£^] has cost 
at most ak for any realization. So we next show that maxg A (e | < A (tt^^ | ^I*). Towards this end, note 
that Lemma[35]implies 

max A (e I < E [f{E, $) | $ - - E [/(dom(*), $) | $ ~ . (40) 

e 

and to prove maxg A (e | < A (ttJ^^j, | ^) it suffices to show 

E [f{E, $) I $ - *] < E [f{E{Trl^, $) U dom(*), $) | $ ^ *] . (41) 

Proving Eq. ( |4T] l is quite straightforward if / is strongly adaptive monotone. Given that / is only adaptive 
monotone, it requires some additional effort. So fix A C i? and let tta be a non-adaptive policy that selects all 
items in A in some arbitrary order. Let V :— {5' : dom(^) = A}. Apply Lemma|35jwith tt' = tt^Qtt^^ and 
any "5/ E V to obtain 

E [/(S(7r;„ U A, $) I $ ^ *] < E [fiE, $) I $ ^ . (42) 

Note that P [*] ' E $) U A, $) | $ ^ vf-] = /avg(7r^@7r;j > /avg«J = E [f{E, $)]. 

Since we know E [f{E, $)] = Z^teP ^ 1^1 ^ [/(^' <i>) | $ ~ "if], an averaging argument together with 
Eq. (|42]| then implies that for slII^i €V 

E [f{E{7Tl^, $) U ^, I $ ^ = E [f{E, $) I $ ^ (43) 

Since 5* was an arbitrary partial realization with dom(^') = A, and A C E was arbitrary, fix vj/ g dom(7r) 



and let A = dom(\I'). With these settings, Eq. (43 1 implies Eq. (41 1, and thus maxe A (e | < A {-k^^^ \ vl/), 
and thus an a-approximate greedy policy can never select an item with cost exceeding ak, where k — Cwc(7r^c)- 
Hence c„c(7r[£_j.]) — c^d'n'ie]) < cik, and so Cwc (7r[^_).] ) < i + ak. This completes the proof. ■ 



Lemma 35 Fix adaptive monotone submodular objective f. For any policy tt and any ^ G dom(7r) we have 

E[/(£:(7r,$),$) I E [/(£;, $) I 

Proof Augment tt to a new policy tt' as follows. Run vr to completion, and let 5"' be the partial realization 
consisting of all of the states it has observed. If 5* C v]/'^ then proceed to select all the remaining items in E in 
any order. Otherwise, if ^l* ^ ^' then terminate. Then 

E[/(£;(7r,$),$) I $ - *] < E[/(£;(7r',$),$) | E [/(£:, $) | $ - (44) 
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where the inequality is by repeated application of the adaptive monotonicity of /, and the equality is by 
construction. ■ 



In ^5.2 we explained how the result of Feige ( 1998| l implies there is no polynomial time (1 — e) In (Q/rj) 
approximation algorithm for self-certifying instances of Adaptive Stochastic Min Cost Cover, unless NP C 
DTIME(n<^(i°si°s")). Here we show the related result for general instances. 

Lemma 36 For every constant e > 0, there is no (1 — e) In {Q/Srj) polynomial time approximation algorithm 
for general instances of Adaptive Stochastic Min Cost Cover, for either the average case objective Cavg{') or 
the worst-case objective c„c{-), unless NP C D77M£'(n<^('°siogn))_ 

Proof We offer a reduction from the Set Cover problem. Fix a Set Cover instance J7, {5*1, ... , S"™} C 2^ 
with unit-cost sets. Fix Q,rj and 5 such that 1/5 and Q/rj are positive integers, and — \U\. Let 
E := {S'l, 5*2, . . . , Sm}, and set of costs of all items to one. Partition U into 1/5 disjoint, equally sized 
subsets [/i, U2^ ■ ■ ■ , Ui/g. Construct a realization $i for each [/,. Let the set of states be O = {null}. 
Hence <i>i (e) = NULL for all i and e, so that no knowledge of the true realization is revealed by selecting 
items. We use a uniform distribution over realizations, i.e., P — 5 for all i. Finally, our objective 
is f{C, $i) :— I Usee {S H Ui)\, i.e., the number of elements in Ui that we cover with sets in C. Since 
\0\ = 1, every realization is consistent with every possible partial realization ^. Hence for any ^I*, we have 
E [/(dom(\i;'), (5/(dom(^')), where /(C) = | Usee S\ is the objective function of the original 

set cover instance. Since / is submodular, / is adaptive submodular. Likewise, since / is monotone, and 
\0\ = 1, / is strongly adaptive monotone. Now, to cover any realization, we must obtain the maximum possi- 
ble value for all realizations, which means selecting a collection of sets C such that UsecS — U. Conversely, 
any C such that UsecS = U clearly covers /. Hence this instance of Adaptive Stochastic Min Cost Cover, 
with either the average case objective Cavg(-) or the worst-case objective c^^{ ), is equivalent to the original Set 
Cover instance. Therefore, the result from |Feige] ( |1998 1 implies that there is no polynomial time algorithm for 
obtaining a (1 — e) In = (1 — e) In {Q/5r]) approximation for Adaptive Stochastic Min Cost Cover unless 
NP C DTIME(n°(i°si°s")). ■ 



A.5 The Min-Sum Objective 

In this section we prove Theorem] 12| which appears on page 



13 in the case where the items have arbitrary 



costs. Our proof resembles the analogous proof of |Streeter and Golovin| ( |2007| l for the non-adaptive min-sum 
submodular cover problem, and, like that proof, ultimately derives from an extremely elegant performance 
analysis of the greedy algorithm for min-sum set cover due to Feige et al. (2004 1. 

The objective function cs( ) generalized to arbitrary cost items uses TT[^t] in place of 7r[t] in the unit-cost 
definition: 

CO oo 



We will prove that any a-approximate greedy policy tt achieves a 4a-approximation for the min-sum objective, 
i.e., c-s{tt) < 4a cs(7r*) for all policies tt*. To do so, we require the following lemma. 

Lemma 37 Fix an a-approximate greedy policy tt for some adaptive monotone submodular function f 
and let := a {favgiT^li+i]) — favgiT^li]))- For any policy tt* and nonnegative integers i and k, we have 



Proof Fix TT, n* ,i, and k. By adaptive monotonicity fayg{n'^^) < /avg(7r[x-i 

/avg(7r[^,]@7r^j.]) < /avg(7r[^,]) + fc • 



r*j,j ). We next aim to prove 

(46) 
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Figure 6: An illustration of the inequality h{x)dx > X]i>o iVi ^ Vi+i)- 



which is sufficient to complete the proof. Towards this end, define a random partial realization ^ — ^'($) := 
{(e,<i>(e)) : e G i?(7r[^i] ,$)}, where $ is distributed asP[$]. Consider the expected marginal benefit of the 

7rj*j,j portion of ttj^^] @7r*j.j conditioned on $ ^ ^I^. Consider A ^7r*j.j | , which equals the expected marginal 
benefit of the ttj'^.j portion of 7r[^i] @7rjj,j conditioned on <i> ~ ^E*. Lemma 
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allows us to bound it as 



E 



A 



(4] I*) 



< E 



c I ^(k] I * 



A(e|^) 



where the expectations are taken over the internal randomness of tt*, if there is any. Note that for all $, 
< fc, where the expectation is again taken over the internal randomness of Tr*^ . Hence 



Hi I*) 



< k for all It follows that E 



A (^7r*j,j I < fc • maxe (A(e | /c(e)). By definition 



of an a-approximate greedy policy, vr obtains at least (1/a) max^ (A (e | 4') /c(e)) > E A |^7r*^.j | /ak 
benefit per unit cost this case. Hence, removing the conditioning on ^I* by taking expectations 



/avg(7r[i+l]) - /avg(7r[i]) > E^ 



ak 



/avg(7r[^i](! 



'Ml: 

ak 



/avglTI"!. 



Multiplying the above inequalities by ak, and substituting in Sj = a (/avg(7''[i+i]) — favgi^'ii])) , we conclude 
ksi > /avg @7r*j,j) — /avg(7r[<_i]) which immediately yields Eq. (46) and concludes the proof. ■ 



Using Lemma |37] together with a geometric argument developed by |Feige et aL] p004[ ), we now prove 
Theorem [T2I 



Proof of Theorem [l2| Let Q := E [f{E, $)] be the maximum possible expected reward, where the expec- 
tation is taken w.r.t. P [$]. Let tt be an a-approximate greedy policy. Define Ri := Q — /avg (7r[i]) and 
= Q - ./avg (7r[^i])- Let Xi := let iji := ^, and let h{x) := Q - /avg(7i'[*^]). We claim 
< /avg (Ti'fil) and so Pi > Ri. This clearly holds if irt^i] is the empty policy, and otherwise vr can 



define Pi 



avg 



always select an item that contributes zero marginal benefit, namely an item it has already played previously. 
Hence an a-approximate greedy policy tt can never select items with negative expected marginal benefit, and 

so /avg {-^[^i]) < /avg {-^li])- By Lcmmals?! /avg (ttJ;^]) < /avg {'^[^t]) + XiS.^. Therefore 



h{Xi) >Q- /avg(7I'[^i]) - x.i 



„ Pi Ri 

Pi ^ 

2-2 



(47) 
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For similar reasons that /avg {'^[^i]) < /avg {^[i])^ we have /avg (7r[j_i]) < /avg (Triij), and so the sequence 
?/27 • • •) is non-increasing. The adaptive monotonicity and adaptive submodularity of / imply that h{x) 
is non-increasing. Informally, this is because otherwise, if /avg(7r*^j) > f-dvg{T^'^^_^_^) for some x, then the 
optimal policy must be sacrificing immediate rewards at time x in exchange for greater returns later, and 
it can be shown that if such a strategy is optimal, then adaptive submodularity cannot hold. Eg. ( |47| and 
the monotonicity of h and i ^ yi imply that h{x)dx > J2i>o iVi ~ ^i+i) (^^e Figure|6i. The left 
hand side is a lower bound for C5](7r*), and because Si — a — -Ri+i) the right hand side simplifies to 
hT.^>oPi = :^cs(7r), proving cs(7r) <4a-cs(7r*). ■ 

A.6 Proof of Approximation Hardness in tlie Absence of Adaptive Submodularity 



We now provide the proof of Theorem 20 which appears on page 28 in H 2 



Proof of Theorem [20) We construct a hard instance based on the following intuition. We make the algorithm 
go "treasure hunting". There is a set of t locations {0, 1, , . . . , t — 1}, there is a treasure at one of these 
locations, and the algorithm gets unit reward if it finds it, and zero reward otherwise. There are m "maps," 
each consisting of a cluster of s bits, and each purporting to indicate where the treasure is, and each map 
is stored in a (weak) secret-sharing way, so that querying few bits of a map reveals nothing about where it 
says the treasure is. Moreover, all but one of the maps me. fake, and there is a puzzle indicating which map 
is the correct one indicating the treasure's location. Formally, a fake map is one which is probabilistically 
independent of the location of the treasure, conditioned on the puzzle. 

Our instance will have three types of elements, E = Et W Em W Ep, where \Et\ = t encodes where 
the treasure is, \Em\ = ms encodes the maps, and \Ep\ — encodes the puzzle, where m, t, s and n are 
specified below. All outcomes are binary, O — {0,1}. For all e e Em U Ep, P [<i>(e) = 1] = .5 independently. 
The conditional distribution P | ^{Em U Ep)] will be deterministic as specified below. Our objective 

function / is linear, and defined as follows: 

fiA,^)^\{eeAnET:He)^l}\. 

We now describe the puzzle, which is to compute i{P) := (perm(P) mod p) mod 2^ for a suitably sampled 
random matrix P, and suitable prime p and integer i, where perm(P) = J^aes TVi=i Pi<y(i) is the permanent 
of P. We exploit Theorem 1 .9 of Feige and Lund ( 1997[ ) in which they show that if there exist constants rj,5 > 



such that a randomized polynomial time algorithm can compute (pcrm(P) mod p) mod 2 correctly with 
probability 2^^(1 + l/n^), where P is drawn uniformly at random from {0, 1, 2, ... ,p — 1}" p is any 
prime superpolynomial in n, and £ < p {-^ — S), then PH = AM = To encode the puzzle, we fix a prime 
p G [2"^^, 2"^^] and use the bits of ^{Ep) to sample P = P($) (nearly) uniformly at random from 
{0, 1, 2, . . . ,p - l}"""" as follows. For a matrix P e Z"^", we let rep(P) := P^j define 
a base p representation of P. Note rep(-) is one-to-one for n x n matrices with entries in Zp, so we can define 
its inverse rep~^(-). The encoding P($) interprets the bits ^{Ep) as an integer x in [2"^], and computes 

y = X mod (p" ). li x < 2"^/p" p" , then P = rep~^(y). Otherwise, P is the all zero matrix. This 

2.3 _ 2 

latter event occurs with probability at most p" /2" < 2 " , and in this case we simply suppose the algorithm 

_ 2 

under consideration finds the treasure and so gets unit reward. This adds 2 " to its expected reward. So let us 
assume from now on that P is drawn uniformly at random. 

Next we consider the maps. Partition Em ~ l±J™ 1 Mi into m maps Ali, each consisting of s items. For 
each map Ali, partition its items into s/ log2 t groups of log2 t bits each, and let Vi e {0, 1, . . . , i — 1} be 
the XOR of these groups of bits. We say Mi points to Vi as the location of the treasure. A priori, each Vi is 
uniformly distributed in {Q, ...,t — 1}. For a particular realization of ^{Ep U Em), define := Wi(p($)). 
We set u(<i>) to be the location of the treasure under realization $, i.e., we label Et = {eo, ei, . . . , et^i} and 
ensure <&(ej) — lif j = Uj(p($)), and ^(e) — for all other e e Et- Note the random variable v = is 
distributed uniformly at random in {0, 1, . . . , t — 1}. Note that this still holds if we condition on the realizations 
of any set of s/ log2 t — 1 items in a map. 
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Now consider the optimal policy with a budget of fc = + s + 1 items to pick. Clearly, its reward can be 
at most 1. However, given a budget of k, a computationally unconstrained policy can exhaustively sample Ep, 
solve the puzzle (i.e., compute i{P)), read the correct map (i.e., exhaustively sample Afj(p)), decode the map 
(i.e., compute v = f i(p)), and get the treasure (i.e., pick €„) thereby obtaining a reward of one. 

Now we give an upper bound on the expected reward R of any randomized polynomial time algorithm 
A with a budget of f3k items, assuming ^ PH. Fix a small constant 7 > 0, and set s — and 
m = t = n}/'' . We suppose we give A the realizations for free. We also replace its budget of 

/3fc items with a budget of f3k specifically for map items in Em and an additional budget of pk specifically 
for the treasure locations in Et- Obviously, this can only help it. As noted, if it selects less than s/ log2 t 
bits from the map M^^p-^ indicated by P, the distribution over fi(p) conditioned on those realizations is still 
uniform. Of course, knowledge of Vi for i 7^ z(P) is useless for getting reward. Hence A can try at most 
/3fc log2 {t)/ s = o{f3k) maps in an attempt to find Mi(^p^ . Note that if we have a randomized algorithm which 
given a random P drawn from {0,l,2,...,p — 1}"^" always outputs a set S of integers of size a such that 
P [i{P) € S*] > g, then we can use it to construct a randomized algorithm that, given P, outputs an integer 
X such that P [i{P) — x\> q/a, simply by running the first algorithm and then selecting a random element 
of S. If A does not find Mi(^p), the distribution on the treasure's location is uniform given its knowledge. 
Hence it's budget of jik treasure locations can only earn it expected reward at most f3k/t. Armed with these 
observations and Theorem 1.9 of Feige and Lund, (1 997 j and our complexity theoretic assumptions, we infer 
E [R] < o(j3k) ■ 2-^(1 + + /3fc/t + 2"" . Since s = n^ and m = t = n^l'^ and 7 = 9(1) and ?/ = 1 

and ^ = log2 m and fc = + s + 1 = 2n? + 1, we have 

E [i?] < ^ (1 + 0(1)) = 2/3n3-i/7(i + 

Next note that = t + ras + n'^ = n?^^/^{l + o(l)). Straightforward algebra shows that in order to ensure 
E [i?] = o{P /\E\^^^), it suffices to choose 7 < e/6. Thus, under our complexity theoretic assumptions, any 
polynomial time randomized algorithm A with budget /3fc achieves at most o(/3/|_E| ^^'^) of the value obtained 
by the optimal policy with budget fc, so the approximation ratio is uj{\E\^^'^ / (5). ■ 
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A.7 A Symbol Table 



J-/, C tZ J-/ 


VJIVJUIILI &CL VJl ILCiiiS, ClllLI till illLli V ILlLltll ILCiii. 


0, o e o 


^tjitp*; Jin itpm mfiv hp in or niitpnmp*; of *;plpptin(T an itpm anH an inHiviHiial *;tatp/niitpnmp 


$ 


A rpali7ation i p a 'Fiinption from itpmQ to QtatPQ 




A nartial i*pali7ation tvnipallv pnpoHintr thp piiiTPnt QPt of ohQpn/ationQ" 

each ^ £^ X is partial mapping from items to states. 




Ttip pon^ii ^itpnpv rplati on" 3> \I/ mpan*; \[f(p] — ^1*^1 for all p rlnmi\I/i 


p 


Xhp nrohahilitv measnrp on rpali7ations 


7r 


A policy, which maps partial realizations to items. 




Xhp items splpctpH hv tt whpn run iinHpr rpaliyation ^ 

1. 11^ I L^LllJ J^l^^ LV^\_I L/ y / L VV 11^11 XLI-11 LI-llVJV^X X ^CiXXZjCiLXWXX 3r . 




The conditional expected marginal benefit of e conditioned on 

V l /' ■"-'I'L^ V-Lwxxxi^ ^yv^^L^j-,^^ J yviwxxxi^ :±: y , ^ y 1 -±r ^ j . 


^ \" 1 ^ / 


The conditional expected marginal benefit of policy tt conditioned on 
A(7r|«') E*[f(dom(*) U £'(7r, - f(dom(4'),$) 1 $ - ^-l. 


*[e/o] 


Shorthand for * U {(e, o)}. 




Budget on the cost of selected item sets. 


7r[fc] 


A truncated policy. See Definition 4 on page 9 (unit costs) and Definition 26 on page 35 




A strictly truncated policy. See Definition 24 on page 35 


7r[fc->] 


A laxly truncated policy. See Definition 25 on page 35 


7r@7r' 


Policies TT and tt' concatenated together See Definition |28 on page 35 


/ 


An objective function, of type / : 2^ x O'^ M>o unless stated otherwise. 


/avg 


Average benefit: /avg(7r) := E<j,[/(£;(7r, $), $)]. 


C 


Item costs c : £^ — > N. Extended to sets via c{S) := X^egS ''(^)- 


Cavg 


Average cost of a policy: Cavg(7r) :— E$[c(i?(7r, $))]. 


Cwc 


Worst-case cost of a policy: Cwc(7r) :— max<i, c(i?(7r, $)). 


ce 


Min-sum cost of a policy: cs(7r) := X^t^o (lE$[/(£^7 'J')] - /avg(7r[^t])) . 


c(7r 'I') 


Conditional average policy cost: c{it\^) := E$[c(i?(7r, "i]. 


a 


Approximation factor for greedy optimization in an a-approximate greedy policy. 


Q 


Benefit quota. Often Q = E$ [/(£:, $)]. 


7] 


The coverage gap: rj — max {?/ : /(S*, $) > Q — ry' implies /(S", <&) > Q for all S* and $}. 


IP 


The indicator for proposition P, which equals one if P is true and zero if P is false. 


Table 2: Important symbols and notations used in this article 
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